multi-linear regression model: Topics by Science.gov

Sample records for multi-linear regression model

A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

NASA Astrophysics Data System (ADS)

Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

2018-04-01

In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Genomic prediction based on data from three layer lines using non-linear regression models.

PubMed

Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L

2014-11-06

Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.
Prediction models for CO2 emission in Malaysia using best subsets regression and multi-linear regression

NASA Astrophysics Data System (ADS)

Tan, C. H.; Matjafri, M. Z.; Lim, H. S.

2015-10-01

This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
Delineating chalk sand distribution of Ekofisk formation using probabilistic neural network (PNN) and stepwise regression (SWR): Case study Danish North Sea field

NASA Astrophysics Data System (ADS)

Haris, A.; Nafian, M.; Riyanto, A.

2017-07-01

Danish North Sea Fields consist of several formations (Ekofisk, Tor, and Cromer Knoll) that was started from the age of Paleocene to Miocene. In this study, the integration of seismic and well log data set is carried out to determine the chalk sand distribution in the Danish North Sea field. The integration of seismic and well log data set is performed by using the seismic inversion analysis and seismic multi-attribute. The seismic inversion algorithm, which is used to derive acoustic impedance (AI), is model-based technique. The derived AI is then used as external attributes for the input of multi-attribute analysis. Moreover, the multi-attribute analysis is used to generate the linear and non-linear transformation of among well log properties. In the case of the linear model, selected transformation is conducted by weighting step-wise linear regression (SWR), while for the non-linear model is performed by using probabilistic neural networks (PNN). The estimated porosity, which is resulted by PNN shows better suited to the well log data compared with the results of SWR. This result can be understood since PNN perform non-linear regression so that the relationship between the attribute data and predicted log data can be optimized. The distribution of chalk sand has been successfully identified and characterized by porosity value ranging from 23% up to 30%.
Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case

NASA Astrophysics Data System (ADS)

Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann

2017-04-01

Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.
Comprehensive ripeness-index for prediction of ripening level in mangoes by multivariate modelling of ripening behaviour

NASA Astrophysics Data System (ADS)

Eyarkai Nambi, Vijayaram; Thangavel, Kuladaisamy; Manickavasagan, Annamalai; Shahir, Sultan

2017-01-01

Prediction of ripeness level in climacteric fruits is essential for post-harvest handling. An index capable of predicting ripening level with minimum inputs would be highly beneficial to the handlers, processors and researchers in fruit industry. A study was conducted with Indian mango cultivars to develop a ripeness index and associated model. Changes in physicochemical, colour and textural properties were measured throughout the ripening period and the period was classified into five stages (unripe, early ripe, partially ripe, ripe and over ripe). Multivariate regression techniques like partial least square regression, principal component regression and multi linear regression were compared and evaluated for its prediction. Multi linear regression model with 12 parameters was found more suitable in ripening prediction. Scientific variable reduction method was adopted to simplify the developed model. Better prediction was achieved with either 2 or 3 variables (total soluble solids, colour and acidity). Cross validation was done to increase the robustness and it was found that proposed ripening index was more effective in prediction of ripening stages. Three-variable model would be suitable for commercial applications where reasonable accuracies are sufficient. However, 12-variable model can be used to obtain more precise results in research and development applications.
Selection of higher order regression models in the analysis of multi-factorial transcription data.

PubMed

Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim

2014-01-01

Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Breeding value accuracy estimates for growth traits using random regression and multi-trait models in Nelore cattle.

PubMed

Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G

2011-06-28

We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of weight records. The results indicate that random regression models provide more accurate expected breeding values than the traditionally finite multi-trait models. Thus, higher genetic responses are expected for beef cattle growth traits by replacing a multi-trait model with random regression models for genetic evaluation. B-spline functions could be applied as an alternative to Legendre polynomials to model covariance functions for weights from birth to mature age.
Finite-sample and asymptotic sign-based tests for parameters of non-linear quantile regression with Markov noise

NASA Astrophysics Data System (ADS)

Sirenko, M. A.; Tarasenko, P. F.; Pushkarev, M. I.

2017-01-01

One of the most noticeable features of sign-based statistical procedures is an opportunity to build an exact test for simple hypothesis testing of parameters in a regression model. In this article, we expanded a sing-based approach to the nonlinear case with dependent noise. The examined model is a multi-quantile regression, which makes it possible to test hypothesis not only of regression parameters, but of noise parameters as well.
Multi-Axis Identifiability Using Single-Surface Parameter Estimation Maneuvers on the X-48B Blended Wing Body

NASA Technical Reports Server (NTRS)

Ratnayake, Nalin A.; Koshimoto, Ed T.; Taylor, Brian R.

2011-01-01

The problem of parameter estimation on hybrid-wing-body type aircraft is complicated by the fact that many design candidates for such aircraft involve a large number of aero- dynamic control effectors that act in coplanar motion. This fact adds to the complexity already present in the parameter estimation problem for any aircraft with a closed-loop control system. Decorrelation of system inputs must be performed in order to ascertain individual surface derivatives with any sort of mathematical confidence. Non-standard control surface configurations, such as clamshell surfaces and drag-rudder modes, further complicate the modeling task. In this paper, asymmetric, single-surface maneuvers are used to excite multiple axes of aircraft motion simultaneously. Time history reconstructions of the moment coefficients computed by the solved regression models are then compared to each other in order to assess relative model accuracy. The reduced flight-test time required for inner surface parameter estimation using multi-axis methods was found to come at the cost of slightly reduced accuracy and statistical confidence for linear regression methods. Since the multi-axis maneuvers captured parameter estimates similar to both longitudinal and lateral-directional maneuvers combined, the number of test points required for the inner, aileron-like surfaces could in theory have been reduced by 50%. While trends were similar, however, individual parameters as estimated by a multi-axis model were typically different by an average absolute difference of roughly 15-20%, with decreased statistical significance, than those estimated by a single-axis model. The multi-axis model exhibited an increase in overall fit error of roughly 1-5% for the linear regression estimates with respect to the single-axis model, when applied to flight data designed for each, respectively.
USING HYDROGRAPHIC DATA AND THE EPA VIRTUAL BEACH MODEL TO TEST PREDICTIONS OF BEACH BACTERIA CONCENTRATIONS

EPA Science Inventory

A modeling study of 2006 Huntington Beach (Lake Erie) beach bacteria concentrations indicates multi-variable linear regression (MLR) can effectively estimate bacteria concentrations compared to the persistence model. Our use of the Virtual Beach (VB) model affirms that fact. VB i...
Using Simulation Technique to overcome the multi-collinearity problem for estimating fuzzy linear regression parameters.

NASA Astrophysics Data System (ADS)

Mansoor Gorgees, Hazim; Hilal, Mariam Mohammed

2018-05-01

Fatigue cracking is one of the common types of pavement distresses and is an indicator of structural failure; cracks allow moisture infiltration, roughness, may further deteriorate to a pothole. Some causes of pavement deterioration are: traffic loading; environment influences; drainage deficiencies; materials quality problems; construction deficiencies and external contributors. Many researchers have made models that contain many variables like asphalt content, asphalt viscosity, fatigue life, stiffness of asphalt mixture, temperature and other parameters that affect the fatigue life. For this situation, a fuzzy linear regression model was employed and analyzed by using the traditional methods and our proposed method in order to overcome the multi-collinearity problem. The total spread error was used as a criterion to compare the performance of the studied methods. Simulation program was used to obtain the required results.
Multivariate meta-analysis for non-linear and other multi-parameter associations

PubMed Central

Gasparrini, A; Armstrong, B; Kenward, M G

2012-01-01

In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043
Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data

PubMed Central

Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.

2013-01-01

Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689
Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

PubMed Central

Huang, Jian; Zhang, Cun-Hui

2013-01-01

The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
Multi-sensory landscape assessment: the contribution of acoustic perception to landscape evaluation.

PubMed

Gan, Yonghong; Luo, Tao; Breitung, Werner; Kang, Jian; Zhang, Tianhai

2014-12-01

In this paper, the contribution of visual and acoustic preference to multi-sensory landscape evaluation was quantitatively compared. The real landscapes were treated as dual-sensory ambiance and separated into visual landscape and soundscape. Both were evaluated by 63 respondents in laboratory conditions. The analysis of the relationship between respondent's visual and acoustic preference as well as their respective contribution to landscape preference showed that (1) some common attributes are universally identified in assessing visual, aural and audio-visual preference, such as naturalness or degree of human disturbance; (2) with acoustic and visual preferences as variables, a multi-variate linear regression model can satisfactorily predict landscape preference (R(2 )= 0.740), while the coefficients of determination for a unitary linear regression model were 0.345 and 0.720 for visual and acoustic preference as predicting factors, respectively; (3) acoustic preference played a much more important role in landscape evaluation than visual preference in this study (the former is about 4.5 times of the latter), which strongly suggests a rethinking of the role of soundscape in environment perception research and landscape planning practice.
Modeling of thermal degradation kinetics of the C-glucosyl xanthone mangiferin in an aqueous model solution as a function of pH and temperature and protective effect of honeybush extract matrix.

PubMed

Beelders, Theresa; de Beer, Dalene; Kidd, Martin; Joubert, Elizabeth

2018-01-01

Mangiferin, a C-glucosyl xanthone, abundant in mango and honeybush, is increasingly targeted for its bioactive properties and thus to enhance functional properties of food. The thermal degradation kinetics of mangiferin at pH3, 4, 5, 6 and 7 were each modeled at five temperatures ranging between 60 and 140°C. First-order reaction models were fitted to the data using non-linear regression to determine the reaction rate constant at each pH-temperature combination. The reaction rate constant increased with increasing temperature and pH. Comparison of the reaction rate constants at 100°C revealed an exponential relationship between the reaction rate constant and pH. The data for each pH were also modeled with the Arrhenius equation using non-linear and linear regression to determine the activation energy and pre-exponential factor. Activation energies decreased slightly with increasing pH. Finally, a multi-linear model taking into account both temperature and pH was developed for mangiferin degradation. Sterilization (121°C for 4min) of honeybush extracts dissolved at pH4, 5 and 7 did not cause noticeable degradation of mangiferin, although the multi-linear model predicted 34% degradation at pH7. The extract matrix is postulated to exert a protective effect as changes in potential precursor content could not fully explain the stability of mangiferin. Copyright © 2017 Elsevier Ltd. All rights reserved.
Combined genetic algorithm and multiple linear regression (GA-MLR) optimizer: Application to multi-exponential fluorescence decay surface.

PubMed

Fisz, Jacek J

2006-12-07

The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi-linear combinations of nonlinear functions, is indicated. The VP algorithm does not distinguish the weakly nonlinear parameters from the nonlinear ones and it does not apply to the model functions which are multi-linear combinations of nonlinear functions.
Virtual Beach v2.2 User Guide

EPA Science Inventory

Virtual Beach version 2.2 (VB 2.2) is a decision support tool. It is designed to construct site-specific Multi-Linear Regression (MLR) models to predict pathogen indicator levels (or fecal indicator bacteria, FIB) at recreational beaches. MLR analysis has outperformed persisten...
Modeling of bromate formation by ozonation of surface waters in drinking water treatment.

PubMed

Legube, Bernard; Parinet, Bernard; Gelinet, Karine; Berne, Florence; Croue, Jean-Philippe

2004-04-01

The main objective of this paper is to try to develop statistically and chemically rational models for bromate formation by ozonation of clarified surface waters. The results presented here show that bromate formation by ozonation of natural waters in drinking water treatment is directly proportional to the "Ct" value ("Ctau" in this study). Moreover, this proportionality strongly depends on many parameters: increasing of pH, temperature and bromide level leading to an increase of bromate formation; ammonia and dissolved organic carbon concentrations causing a reverse effect. Taking into account limitation of theoretical modeling, we proposed to predict bromate formation by stochastic simulations (multi-linear regression and artificial neural networks methods) from 40 experiments (BrO(3)(-) vs. "Ctau") carried out with three sand filtered waters sampled on three different waterworks. With seven selected variables we used a simple architecture of neural networks, optimized by "neural connection" of SPSS Inc./Recognition Inc. The bromate modeling by artificial neural networks gives better result than multi-linear regression. The artificial neural networks model allowed us classifying variables by decreasing order of influence (for the studied cases in our variables scale): "Ctau", [N-NH(4)(+)], [Br(-)], pH, temperature, DOC, alkalinity.

Time-resolved flow reconstruction with indirect measurements using regression models and Kalman-filtered POD ROM

NASA Astrophysics Data System (ADS)

Leroux, Romain; Chatellier, Ludovic; David, Laurent

2018-01-01

This article is devoted to the estimation of time-resolved particle image velocimetry (TR-PIV) flow fields using a time-resolved point measurements of a voltage signal obtained by hot-film anemometry. A multiple linear regression model is first defined to map the TR-PIV flow fields onto the voltage signal. Due to the high temporal resolution of the signal acquired by the hot-film sensor, the estimates of the TR-PIV flow fields are obtained with a multiple linear regression method called orthonormalized partial least squares regression (OPLSR). Subsequently, this model is incorporated as the observation equation in an ensemble Kalman filter (EnKF) applied on a proper orthogonal decomposition reduced-order model to stabilize it while reducing the effects of the hot-film sensor noise. This method is assessed for the reconstruction of the flow around a NACA0012 airfoil at a Reynolds number of 1000 and an angle of attack of {20}°. Comparisons with multi-time delay-modified linear stochastic estimation show that both the OPLSR and EnKF combined with OPLSR are more accurate as they produce a much lower relative estimation error, and provide a faithful reconstruction of the time evolution of the velocity flow fields.
Monthly monsoon rainfall forecasting using artificial neural networks

NASA Astrophysics Data System (ADS)

Ganti, Ravikumar

2014-10-01

Indian agriculture sector heavily depends on monsoon rainfall for successful harvesting. In the past, prediction of rainfall was mainly performed using regression models, which provide reasonable accuracy in the modelling and forecasting of complex physical systems. Recently, Artificial Neural Networks (ANNs) have been proposed as efficient tools for modelling and forecasting. A feed-forward multi-layer perceptron type of ANN architecture trained using the popular back-propagation algorithm was employed in this study. Other techniques investigated for modeling monthly monsoon rainfall include linear and non-linear regression models for comparison purposes. The data employed in this study include monthly rainfall and monthly average of the daily maximum temperature in the North Central region in India. Specifically, four regression models and two ANN model's were developed. The performance of various models was evaluated using a wide variety of standard statistical parameters and scatter plots. The results obtained in this study for forecasting monsoon rainfalls using ANNs have been encouraging. India's economy and agricultural activities can be effectively managed with the help of the availability of the accurate monsoon rainfall forecasts.
Differential gene expression detection and sample classification using penalized linear regression models.

PubMed

Wu, Baolin

2006-02-15

Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Boosting multi-state models.

PubMed

Reulen, Holger; Kneib, Thomas

2016-04-01

One important goal in multi-state modelling is to explore information about conditional transition-type-specific hazard rate functions by estimating influencing effects of explanatory variables. This may be performed using single transition-type-specific models if these covariate effects are assumed to be different across transition-types. To investigate whether this assumption holds or whether one of the effects is equal across several transition-types (cross-transition-type effect), a combined model has to be applied, for instance with the use of a stratified partial likelihood formulation. Here, prior knowledge about the underlying covariate effect mechanisms is often sparse, especially about ineffectivenesses of transition-type-specific or cross-transition-type effects. As a consequence, data-driven variable selection is an important task: a large number of estimable effects has to be taken into account if joint modelling of all transition-types is performed. A related but subsequent task is model choice: is an effect satisfactory estimated assuming linearity, or is the true underlying nature strongly deviating from linearity? This article introduces component-wise Functional Gradient Descent Boosting (short boosting) for multi-state models, an approach performing unsupervised variable selection and model choice simultaneously within a single estimation run. We demonstrate that features and advantages in the application of boosting introduced and illustrated in classical regression scenarios remain present in the transfer to multi-state models. As a consequence, boosting provides an effective means to answer questions about ineffectiveness and non-linearity of single transition-type-specific or cross-transition-type effects.
[Research on the method of interference correction for nondispersive infrared multi-component gas analysis].

PubMed

Sun, You-Wen; Liu, Wen-Qing; Wang, Shi-Mei; Huang, Shu-Hua; Yu, Xiao-Man

2011-10-01

A method of interference correction for nondispersive infrared multi-component gas analysis was described. According to the successive integral gas absorption models and methods, the influence of temperature and air pressure on the integral line strengths and linetype was considered, and based on Lorentz detuning linetypes, the absorption cross sections and response coefficients of H2O, CO2, CO, and NO on each filter channel were obtained. The four dimension linear regression equations for interference correction were established by response coefficients, the absorption cross interference was corrected by solving the multi-dimensional linear regression equations, and after interference correction, the pure absorbance signal on each filter channel was only controlled by the corresponding target gas concentration. When the sample cell was filled with gas mixture with a certain concentration proportion of CO, NO and CO2, the pure absorbance after interference correction was used for concentration inversion, the inversion concentration error for CO2 is 2.0%, the inversion concentration error for CO is 1.6%, and the inversion concentration error for NO is 1.7%. Both the theory and experiment prove that the interference correction method proposed for NDIR multi-component gas analysis is feasible.
Multi-decadal trend and space-time variability of sea level over the Indian Ocean since the 1950s: impact of decadal climate modes

NASA Astrophysics Data System (ADS)

Han, W.; Stammer, D.; Meehl, G. A.; Hu, A.; Sienz, F.

2016-12-01

Sea level varies on decadal and multi-decadal timescales over the Indian Ocean. The variations are not spatially uniform, and can deviate considerably from the global mean sea level rise (SLR) due to various geophysical processes. One of these processes is the change of ocean circulation, which can be partly attributed to natural internal modes of climate variability. Over the Indian Ocean, the most influential climate modes on decadal and multi-decadal timescales are the Interdecadal Pacific Oscillation (IPO) and decadal variability of the Indian Ocean dipole (IOD). Here, we first analyze observational datasets to investigate the impacts of IPO and IOD on spatial patterns of decadal and interdecadal (hereafter decal) sea level variability & multi-decadal trend over the Indian Ocean since the 1950s, using a new statistical approach of Bayesian Dynamical Linear regression Model (DLM). The Bayesian DLM overcomes the limitation of "time-constant (static)" regression coefficients in conventional multiple linear regression model, by allowing the coefficients to vary with time and therefore measuring "time-evolving (dynamical)" relationship between climate modes and sea level. For the multi-decadal sea level trend since the 1950s, our results show that climate modes and non-climate modes (the part that cannot be explained by climate modes) have comparable contributions in magnitudes but with different spatial patterns, with each dominating different regions of the Indian Ocean. For decadal variability, climate modes are the major contributors for sea level variations over most region of the tropical Indian Ocean. The relative importance of IPO and decadal variability of IOD, however, varies spatially. For example, while IOD decadal variability dominates IPO in the eastern equatorial basin (85E-100E, 5S-5N), IPO dominates IOD in causing sea level variations in the tropical southwest Indian Ocean (45E-65E, 12S-2S). To help decipher the possible contribution of external forcing to the multi-decadal sea level trend and decadal variability, we also analyze the model outputs from NCAR's Community Earth System Model (CESM) Large Ensemble Experiments, and compare the results with our observational analyses.
On approaches to analyze the sensitivity of simulated hydrologic fluxes to model parameters in the community land model

DOE PAGES

Bao, Jie; Hou, Zhangshuan; Huang, Maoyi; ...

2015-12-04

Here, effective sensitivity analysis approaches are needed to identify important parameters or factors and their uncertainties in complex Earth system models composed of multi-phase multi-component phenomena and multiple biogeophysical-biogeochemical processes. In this study, the impacts of 10 hydrologic parameters in the Community Land Model on simulations of runoff and latent heat flux are evaluated using data from a watershed. Different metrics, including residual statistics, the Nash-Sutcliffe coefficient, and log mean square error, are used as alternative measures of the deviations between the simulated and field observed values. Four sensitivity analysis (SA) approaches, including analysis of variance based on the generalizedmore » linear model, generalized cross validation based on the multivariate adaptive regression splines model, standardized regression coefficients based on a linear regression model, and analysis of variance based on support vector machine, are investigated. Results suggest that these approaches show consistent measurement of the impacts of major hydrologic parameters on response variables, but with differences in the relative contributions, particularly for the secondary parameters. The convergence behaviors of the SA with respect to the number of sampling points are also examined with different combinations of input parameter sets and output response variables and their alternative metrics. This study helps identify the optimal SA approach, provides guidance for the calibration of the Community Land Model parameters to improve the model simulations of land surface fluxes, and approximates the magnitudes to be adjusted in the parameter values during parametric model optimization.« less
Hourly predictive Levenberg-Marquardt ANN and multi linear regression models for predicting of dew point temperature

NASA Astrophysics Data System (ADS)

Zounemat-Kermani, Mohammad

2012-08-01

In this study, the ability of two models of multi linear regression (MLR) and Levenberg-Marquardt (LM) feed-forward neural network was examined to estimate the hourly dew point temperature. Dew point temperature is the temperature at which water vapor in the air condenses into liquid. This temperature can be useful in estimating meteorological variables such as fog, rain, snow, dew, and evapotranspiration and in investigating agronomical issues as stomatal closure in plants. The availability of hourly records of climatic data (air temperature, relative humidity and pressure) which could be used to predict dew point temperature initiated the practice of modeling. Additionally, the wind vector (wind speed magnitude and direction) and conceptual input of weather condition were employed as other input variables. The three quantitative standard statistical performance evaluation measures, i.e. the root mean squared error, mean absolute error, and absolute logarithmic Nash-Sutcliffe efficiency coefficient ( {| {{{Log}}({{NS}})} |} ) were employed to evaluate the performances of the developed models. The results showed that applying wind vector and weather condition as input vectors along with meteorological variables could slightly increase the ANN and MLR predictive accuracy. The results also revealed that LM-NN was superior to MLR model and the best performance was obtained by considering all potential input variables in terms of different evaluation criteria.
Comparison of buried sand ridges and regressive sand ridges on the outer shelf of the East China Sea

NASA Astrophysics Data System (ADS)

Wu, Ziyin; Jin, Xianglong; Zhou, Jieqiong; Zhao, Dineng; Shang, Jihong; Li, Shoujun; Cao, Zhenyi; Liang, Yuyang

2017-06-01

Based on multi-beam echo soundings and high-resolution single-channel seismic profiles, linear sand ridges in U14 and U2 on the East China Sea (ECS) shelf are identified and compared in detail. Linear sand ridges in U14 are buried sand ridges, which are 90 m below the seafloor. It is presumed that these buried sand ridges belong to the transgressive systems tract (TST) formed 320-200 ka ago and that their top interface is the maximal flooding surface (MFS). Linear sand ridges in U2 are regressive sand ridges. It is presumed that these buried sand ridges belong to the TST of the last glacial maximum (LGM) and that their top interface is the MFS of the LGM. Four sub-stage sand ridges of U2 are discerned from the high-resolution single-channel seismic profile and four strikes of regressive sand ridges are distinguished from the submarine topographic map based on the multi-beam echo soundings. These multi-stage and multi-strike linear sand ridges are the response of, and evidence for, the evolution of submarine topography with respect to sea-level fluctuations since the LGM. Although the difference in the age of formation between U14 and U2 is 200 ka and their sequences are 90 m apart, the general strikes of the sand ridges are similar. This indicates that the basic configuration of tidal waves on the ECS shelf has been stable for the last 200 ka. A basic evolutionary model of the strata of the ECS shelf is proposed, in which sea-level change is the controlling factor. During the sea-level change of about 100 ka, five to six strata are developed and the sand ridges develop in the TST. A similar story of the evolution of paleo-topography on the ECS shelf has been repeated during the last 300 ka.
Differences in Student Evaluations of Limited-Term Lecturers and Full-Time Faculty

ERIC Educational Resources Information Center

Cho, Jeong-Il; Otani, Koichiro; Kim, B. Joon

2014-01-01

This study compared student evaluations of teaching (SET) for limited-term lecturers (LTLs) and full-time faculty (FTF) using a Likert-scaled survey administered to students (N = 1,410) at the end of university courses. Data were analyzed using a general linear regression model to investigate the influence of multi-dimensional evaluation items on…
On the concept of sloped motion for free-floating wave energy converters.

PubMed

Payne, Grégory S; Pascal, Rémy; Vaillant, Guillaume

2015-10-08

A free-floating wave energy converter (WEC) concept whose power take-off (PTO) system reacts against water inertia is investigated herein. The main focus is the impact of inclining the PTO direction on the system performance. The study is based on a numerical model whose formulation is first derived in detail. Hydrodynamics coefficients are obtained using the linear boundary element method package WAMIT. Verification of the model is provided prior to its use for a PTO parametric study and a multi-objective optimization based on a multi-linear regression method. It is found that inclining the direction of the PTO at around 50° to the vertical is highly beneficial for the WEC performance in that it provides a high capture width ratio over a broad region of the wave period range.
On the concept of sloped motion for free-floating wave energy converters

PubMed Central

Payne, Grégory S.; Pascal, Rémy; Vaillant, Guillaume

2015-01-01

A free-floating wave energy converter (WEC) concept whose power take-off (PTO) system reacts against water inertia is investigated herein. The main focus is the impact of inclining the PTO direction on the system performance. The study is based on a numerical model whose formulation is first derived in detail. Hydrodynamics coefficients are obtained using the linear boundary element method package WAMIT. Verification of the model is provided prior to its use for a PTO parametric study and a multi-objective optimization based on a multi-linear regression method. It is found that inclining the direction of the PTO at around 50° to the vertical is highly beneficial for the WEC performance in that it provides a high capture width ratio over a broad region of the wave period range. PMID:26543397
Lateral-Directional Parameter Estimation on the X-48B Aircraft Using an Abstracted, Multi-Objective Effector Model

NASA Technical Reports Server (NTRS)

Ratnayake, Nalin A.; Waggoner, Erin R.; Taylor, Brian R.

2011-01-01

The problem of parameter estimation on hybrid-wing-body aircraft is complicated by the fact that many design candidates for such aircraft involve a large number of aerodynamic control effectors that act in coplanar motion. This adds to the complexity already present in the parameter estimation problem for any aircraft with a closed-loop control system. Decorrelation of flight and simulation data must be performed in order to ascertain individual surface derivatives with any sort of mathematical confidence. Non-standard control surface configurations, such as clamshell surfaces and drag-rudder modes, further complicate the modeling task. In this paper, time-decorrelation techniques are applied to a model structure selected through stepwise regression for simulated and flight-generated lateral-directional parameter estimation data. A virtual effector model that uses mathematical abstractions to describe the multi-axis effects of clamshell surfaces is developed and applied. Comparisons are made between time history reconstructions and observed data in order to assess the accuracy of the regression model. The Cram r-Rao lower bounds of the estimated parameters are used to assess the uncertainty of the regression model relative to alternative models. Stepwise regression was found to be a useful technique for lateral-directional model design for hybrid-wing-body aircraft, as suggested by available flight data. Based on the results of this study, linear regression parameter estimation methods using abstracted effectors are expected to perform well for hybrid-wing-body aircraft properly equipped for the task.
On the null distribution of Bayes factors in linear regression

USDA-ARS?s Scientific Manuscript database

We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...
Statistical post-processing of seasonal multi-model forecasts: Why is it so hard to beat the multi-model mean?

NASA Astrophysics Data System (ADS)

Siegert, Stefan

2017-04-01

Initialised climate forecasts on seasonal time scales, run several months or even years ahead, are now an integral part of the battery of products offered by climate services world-wide. The availability of seasonal climate forecasts from various modeling centres gives rise to multi-model ensemble forecasts. Post-processing such seasonal-to-decadal multi-model forecasts is challenging 1) because the cross-correlation structure between multiple models and observations can be complicated, 2) because the amount of training data to fit the post-processing parameters is very limited, and 3) because the forecast skill of numerical models tends to be low on seasonal time scales. In this talk I will review new statistical post-processing frameworks for multi-model ensembles. I will focus particularly on Bayesian hierarchical modelling approaches, which are flexible enough to capture commonly made assumptions about collective and model-specific biases of multi-model ensembles. Despite the advances in statistical methodology, it turns out to be very difficult to out-perform the simplest post-processing method, which just recalibrates the multi-model ensemble mean by linear regression. I will discuss reasons for this, which are closely linked to the specific characteristics of seasonal multi-model forecasts. I explore possible directions for improvements, for example using informative priors on the post-processing parameters, and jointly modelling forecasts and observations.
Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

PubMed Central

Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

2012-01-01

In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

PubMed

Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

2012-12-01

In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Simulation of multi-stage nonlinear bone remodeling induced by fixed partial dentures of different configurations: a comparative clinical and numerical study.

PubMed

Liao, Zhipeng; Yoda, Nobuhiro; Chen, Junning; Zheng, Keke; Sasaki, Keiichi; Swain, Michael V; Li, Qing

2017-04-01

This paper aimed to develop a clinically validated bone remodeling algorithm by integrating bone's dynamic properties in a multi-stage fashion based on a four-year clinical follow-up of implant treatment. The configurational effects of fixed partial dentures (FPDs) were explored using a multi-stage remodeling rule. Three-dimensional real-time occlusal loads during maximum voluntary clenching were measured with a piezoelectric force transducer and were incorporated into a computerized tomography-based finite element mandibular model. Virtual X-ray images were generated based on simulation and statistically correlated with clinical data using linear regressions. The strain energy density-driven remodeling parameters were regulated over the time frame considered. A linear single-stage bone remodeling algorithm, with a single set of constant remodeling parameters, was found to poorly fit with clinical data through linear regression (low [Formula: see text] and R), whereas a time-dependent multi-stage algorithm better simulated the remodeling process (high [Formula: see text] and R) against the clinical results. The three-implant-supported and distally cantilevered FPDs presented noticeable and continuous bone apposition, mainly adjacent to the cervical and apical regions. The bridged and mesially cantilevered FPDs showed bone resorption or no visible bone formation in some areas. Time-dependent variation of bone remodeling parameters is recommended to better correlate remodeling simulation with clinical follow-up. The position of FPD pontics plays a critical role in mechanobiological functionality and bone remodeling. Caution should be exercised when selecting the cantilever FPD due to the risk of overloading bone resorption.
Kinetic analysis of non-isothermal solid-state reactions: multi-stage modeling without assumptions in the reaction mechanism.

PubMed

Pomerantsev, Alexey L; Kutsenova, Alla V; Rodionova, Oxana Ye

2017-02-01

A novel non-linear regression method for modeling non-isothermal thermogravimetric data is proposed. Experiments for several heating rates are analyzed simultaneously. The method is applicable to complex multi-stage processes when the number of stages is unknown. Prior knowledge of the type of kinetics is not required. The main idea is a consequent estimation of parameters when the overall model is successively changed from one level of modeling to another. At the first level, the Avrami-Erofeev functions are used. At the second level, the Sestak-Berggren functions are employed with the goal to broaden the overall model. The method is tested using both simulated and real-world data. A comparison of the proposed method with a recently published 'model-free' deconvolution method is presented.
Application of multi-scale wavelet entropy and multi-resolution Volterra models for climatic downscaling

NASA Astrophysics Data System (ADS)

Sehgal, V.; Lakhanpal, A.; Maheswaran, R.; Khosa, R.; Sridhar, Venkataramana

2018-01-01

This study proposes a wavelet-based multi-resolution modeling approach for statistical downscaling of GCM variables to mean monthly precipitation for five locations at Krishna Basin, India. Climatic dataset from NCEP is used for training the proposed models (Jan.'69 to Dec.'94) and are applied to corresponding CanCM4 GCM variables to simulate precipitation for the validation (Jan.'95-Dec.'05) and forecast (Jan.'06-Dec.'35) periods. The observed precipitation data is obtained from the India Meteorological Department (IMD) gridded precipitation product at 0.25 degree spatial resolution. This paper proposes a novel Multi-Scale Wavelet Entropy (MWE) based approach for clustering climatic variables into suitable clusters using k-means methodology. Principal Component Analysis (PCA) is used to obtain the representative Principal Components (PC) explaining 90-95% variance for each cluster. A multi-resolution non-linear approach combining Discrete Wavelet Transform (DWT) and Second Order Volterra (SoV) is used to model the representative PCs to obtain the downscaled precipitation for each downscaling location (W-P-SoV model). The results establish that wavelet-based multi-resolution SoV models perform significantly better compared to the traditional Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) based frameworks. It is observed that the proposed MWE-based clustering and subsequent PCA, helps reduce the dimensionality of the input climatic variables, while capturing more variability compared to stand-alone k-means (no MWE). The proposed models perform better in estimating the number of precipitation events during the non-monsoon periods whereas the models with clustering without MWE over-estimate the rainfall during the dry season.

The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

ERIC Educational Resources Information Center

Haberman, Shelby J.; Sinharay, Sandip

2010-01-01

Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Multi-model ensemble estimation of volume transport through the straits of the East/Japan Sea

NASA Astrophysics Data System (ADS)

Han, Sooyeon; Hirose, Naoki; Usui, Norihisa; Miyazawa, Yasumasa

2016-01-01

The volume transports measured at the Korea/Tsushima, Tsugaru, and Soya/La Perouse Straits remain quantitatively inconsistent. However, data assimilation models at least provide a self-consistent budget despite subtle differences among the models. This study examined the seasonal variation of the volume transport using the multiple linear regression and ridge regression of multi-model ensemble (MME) methods to estimate more accurately transport at these straits by using four different data assimilation models. The MME outperformed all of the single models by reducing uncertainties, especially the multicollinearity problem with the ridge regression. However, the regression constants turned out to be inconsistent with each other if the MME was applied separately for each strait. The MME for a connected system was thus performed to find common constants for these straits. The estimation of this MME was found to be similar to the MME result of sea level difference (SLD). The estimated mean transport (2.43 Sv) was smaller than the measurement data at the Korea/Tsushima Strait, but the calibrated transport of the Tsugaru Strait (1.63 Sv) was larger than the observed data. The MME results of transport and SLD also suggested that the standard deviation (STD) of the Korea/Tsushima Strait is larger than the STD of the observation, whereas the estimated results were almost identical to that observed for the Tsugaru and Soya/La Perouse Straits. The similarity between MME results enhances the reliability of the present MME estimation.
Expanding the occupational health methodology: A concatenated artificial neural network approach to model the burnout process in Chinese nurses.

PubMed

Ladstätter, Felix; Garrosa, Eva; Moreno-Jiménez, Bernardo; Ponsoda, Vicente; Reales Aviles, José Manuel; Dai, Junming

2016-01-01

Artificial neural networks are sophisticated modelling and prediction tools capable of extracting complex, non-linear relationships between predictor (input) and predicted (output) variables. This study explores this capacity by modelling non-linearities in the hardiness-modulated burnout process with a neural network. Specifically, two multi-layer feed-forward artificial neural networks are concatenated in an attempt to model the composite non-linear burnout process. Sensitivity analysis, a Monte Carlo-based global simulation technique, is then utilised to examine the first-order effects of the predictor variables on the burnout sub-dimensions and consequences. Results show that (1) this concatenated artificial neural network approach is feasible to model the burnout process, (2) sensitivity analysis is a prolific method to study the relative importance of predictor variables and (3) the relationships among variables involved in the development of burnout and its consequences are to different degrees non-linear. Many relationships among variables (e.g., stressors and strains) are not linear, yet researchers use linear methods such as Pearson correlation or linear regression to analyse these relationships. Artificial neural network analysis is an innovative method to analyse non-linear relationships and in combination with sensitivity analysis superior to linear methods.
Covariance functions for body weight from birth to maturity in Nellore cows.

PubMed

Boligon, A A; Mercadante, M E Z; Forni, S; Lôbo, R B; Albuquerque, L G

2010-03-01

The objective of this study was to estimate (co)variance functions using random regression models on Legendre polynomials for the analysis of repeated measures of BW from birth to adult age. A total of 82,064 records from 8,145 females were analyzed. Different models were compared. The models included additive direct and maternal effects, and animal and maternal permanent environmental effects as random terms. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of animal age (cubic regression) were considered as random covariables. Eight models with polynomials of third to sixth order were used to describe additive direct and maternal effects, and animal and maternal permanent environmental effects. Residual effects were modeled using 1 (i.e., assuming homogeneity of variances across all ages) or 5 age classes. The model with 5 classes was the best to describe the trajectory of residuals along the growth curve. The model including fourth- and sixth-order polynomials for additive direct and animal permanent environmental effects, respectively, and third-order polynomials for maternal genetic and maternal permanent environmental effects were the best. Estimates of (co)variance obtained with the multi-trait and random regression models were similar. Direct heritability estimates obtained with the random regression models followed a trend similar to that obtained with the multi-trait model. The largest estimates of maternal heritability were those of BW taken close to 240 d of age. In general, estimates of correlation between BW from birth to 8 yr of age decreased with increasing distance between ages.
Riemannian multi-manifold modeling and clustering in brain networks

NASA Astrophysics Data System (ADS)

Slavakis, Konstantinos; Salsabilian, Shiva; Wack, David S.; Muldoon, Sarah F.; Baidoo-Williams, Henry E.; Vettel, Jean M.; Cieslak, Matthew; Grafton, Scott T.

2017-08-01

This paper introduces Riemannian multi-manifold modeling in the context of brain-network analytics: Brainnetwork time-series yield features which are modeled as points lying in or close to a union of a finite number of submanifolds within a known Riemannian manifold. Distinguishing disparate time series amounts thus to clustering multiple Riemannian submanifolds. To this end, two feature-generation schemes for brain-network time series are put forth. The first one is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second one utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positivedefinite matrices. Based on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their Riemannian-geometry properties. Numerical tests on time series, synthetically generated from real brain-network structural connectivity matrices, reveal that the proposed scheme outperforms classical and state-of-the-art techniques in clustering brain-network states/structures.
Physical Interpretation of the Correlation Between Multi-Angle Spectral Data and Canopy Height

NASA Technical Reports Server (NTRS)

Schull, M. A.; Ganguly, S.; Samanta, A.; Huang, D.; Shabanov, N. V.; Jenkins, J. P.; Chiu, J. C.; Marshak, A.; Blair, J. B.; Myneni, R. B.;

2007-01-01

Recent empirical studies have shown that multi-angle spectral data can be useful for predicting canopy height, but the physical reason for this correlation was not understood. We follow the concept of canopy spectral invariants, specifically escape probability, to gain insight into the observed correlation. Airborne Multi-Angle Imaging Spectrometer (AirMISR) and airborne Laser Vegetation Imaging Sensor (LVIS) data acquired during a NASA Terrestrial Ecology Program aircraft campaign underlie our analysis. Two multivariate linear regression models were developed to estimate LVIS height measures from 28 AirMISR multi-angle spectral reflectances and from the spectrally invariant escape probability at 7 AirMISR view angles. Both models achieved nearly the same accuracy, suggesting that canopy spectral invariant theory can explain the observed correlation. We hypothesize that the escape probability is sensitive to the aspect ratio (crown diameter to crown height). The multi-angle spectral data alone therefore may not provide enough information to retrieve canopy height globally

Stochastic search, optimization and regression with energy applications

NASA Astrophysics Data System (ADS)

Hannah, Lauren A.

Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings. Finally, we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.
Pseudo-second order models for the adsorption of safranin onto activated carbon: comparison of linear and non-linear regression methods.

PubMed

Kumar, K Vasanth

2007-04-02

Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Multi-linear regression of sea level in the south west Pacific as a first step towards local sea level projections

NASA Astrophysics Data System (ADS)

Kumar, Vandhna; Meyssignac, Benoit; Melet, Angélique; Ganachaud, Alexandre

2017-04-01

Rising sea levels are a critical concern in small island nations. The problem is especially serious in the western south Pacific, where the total sea level rise over the last 60 years is up to 3 times the global average. In this study, we attempt to reconstruct sea levels at selected sites in the region (Suva, Lautoka, Noumea - Fiji and New Caledonia) as a mutiple-linear regression of atmospheric and oceanic variables. We focus on interannual-to-decadal scale variability, and lower (including the global mean sea level rise) over the 1979-2014 period. Sea levels are taken from tide gauge records and the ORAS4 reanalysis dataset, and are expressed as a sum of steric and mass changes as a preliminary step. The key development in our methodology is using leading wind stress curl as a proxy for the thermosteric component. This is based on the knowledge that wind stress curl anomalies can modulate the thermocline depth and resultant sea levels via Rossby wave propagation. The analysis is primarily based on correlation between local sea level and selected predictors, the dominant one being wind stress curl. In the first step, proxy boxes for wind stress curl are determined via regions of highest correlation. The proportion of sea level explained via linear regression is then removed, leaving a residual. This residual is then correlated with other locally acting potential predictors: halosteric sea level, the zonal and meridional wind stress components, and sea surface temperature. The statistically significant predictors are used in a multi-linear regression function to simulate the observed sea level. The method is able to reproduce between 40 to 80% of the variance in observed sea level. Based on the skill of the model, it has high potential in sea level projection and downscaling studies.
Correlation and simple linear regression.

PubMed

Eberly, Lynn E

2007-01-01

This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling.

PubMed

Perdikaris, P; Raissi, M; Damianou, A; Lawrence, N D; Karniadakis, G E

2017-02-01

Multi-fidelity modelling enables accurate inference of quantities of interest by synergistically combining realizations of low-cost/low-fidelity models with a small set of high-fidelity observations. This is particularly effective when the low- and high-fidelity models exhibit strong correlations, and can lead to significant computational gains over approaches that solely rely on high-fidelity models. However, in many cases of practical interest, low-fidelity models can only be well correlated to their high-fidelity counterparts for a specific range of input parameters, and potentially return wrong trends and erroneous predictions if probed outside of their validity regime. Here we put forth a probabilistic framework based on Gaussian process regression and nonlinear autoregressive schemes that is capable of learning complex nonlinear and space-dependent cross-correlations between models of variable fidelity, and can effectively safeguard against low-fidelity models that provide wrong trends. This introduces a new class of multi-fidelity information fusion algorithms that provide a fundamental extension to the existing linear autoregressive methodologies, while still maintaining the same algorithmic complexity and overall computational cost. The performance of the proposed methods is tested in several benchmark problems involving both synthetic and real multi-fidelity datasets from computational fluid dynamics simulations.
Digital Image Restoration Under a Regression Model - The Unconstrained, Linear Equality and Inequality Constrained Approaches

DTIC Science & Technology

1974-01-01

REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
Advanced statistics: linear regression, part I: simple linear regression.

PubMed

Marill, Keith A

2004-01-01

Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Modelling the Relationship Between Land Surface Temperature and Landscape Patterns of Land Use Land Cover Classification Using Multi Linear Regression Models

NASA Astrophysics Data System (ADS)

Bernales, A. M.; Antolihao, J. A.; Samonte, C.; Campomanes, F.; Rojas, R. J.; dela Serna, A. M.; Silapan, J.

2016-06-01

The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC) and land surface temperature (LST). Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric "Effective mesh size" was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas) and looking for common predictors between LSTs of these two different farming periods.
Groundwater-level prediction using multiple linear regression and artificial neural network techniques: a comparative assessment

NASA Astrophysics Data System (ADS)

Sahoo, Sasmita; Jha, Madan K.

2013-12-01

The potential of multiple linear regression (MLR) and artificial neural network (ANN) techniques in predicting transient water levels over a groundwater basin were compared. MLR and ANN modeling was carried out at 17 sites in Japan, considering all significant inputs: rainfall, ambient temperature, river stage, 11 seasonal dummy variables, and influential lags of rainfall, ambient temperature, river stage and groundwater level. Seventeen site-specific ANN models were developed, using multi-layer feed-forward neural networks trained with Levenberg-Marquardt backpropagation algorithms. The performance of the models was evaluated using statistical and graphical indicators. Comparison of the goodness-of-fit statistics of the MLR models with those of the ANN models indicated that there is better agreement between the ANN-predicted groundwater levels and the observed groundwater levels at all the sites, compared to the MLR. This finding was supported by the graphical indicators and the residual analysis. Thus, it is concluded that the ANN technique is superior to the MLR technique in predicting spatio-temporal distribution of groundwater levels in a basin. However, considering the practical advantages of the MLR technique, it is recommended as an alternative and cost-effective groundwater modeling tool.
Comparison of two-concentration with multi-concentration linear regressions: Retrospective data analysis of multiple regulated LC-MS bioanalytical projects.

PubMed

Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi

2013-09-01

Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore, examples are given as how to evaluate the linearity over the entire concentration range when only two concentration levels are used for linear regression. To conclude, two-concentration linear regression is accurate and robust enough for routine use in regulated LC-MS bioanalysis and it significantly saves time and cost as well. Copyright © 2013 Elsevier B.V. All rights reserved.
Linear regression crash prediction models : issues and proposed solutions.

DOT National Transportation Integrated Search

2010-05-01

The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Multi-model ensemble combinations of the water budget in the East/Japan Sea

NASA Astrophysics Data System (ADS)

HAN, S.; Hirose, N.; Usui, N.; Miyazawa, Y.

2016-02-01

The water balance of East/Japan Sea is determined mainly by inflow and outflow through the Korea/Tsushima, Tsugaru and Soya/La Perouse Straits. However, the volume transports measured at three straits remain quantitatively unbalanced. This study examined the seasonal variation of the volume transport using the multiple linear regression and ridge regression of multi-model ensemble (MME) methods to estimate physically consistent circulation in East/Japan Sea by using four different data assimilation models. The MME outperformed all of the single models by reducing uncertainties, especially the multicollinearity problem with the ridge regression. However, the regression constants turned out to be inconsistent with each other if the MME was applied separately for each strait. The MME for a connected system was thus performed to find common constants for these straits. The estimation of this MME was found to be similar to the MME result of sea level difference (SLD). The estimated mean transport (2.42 Sv) was smaller than the measurement data at the Korea/Tsushima Strait, but the calibrated transport of the Tsugaru Strait (1.63 Sv) was larger than the observed data. The MME results of transport and SLD also suggested that the standard deviation (STD) of the Korea/Tsushima Strait is larger than the STD of the observation, whereas the estimated results were almost identical to that observed for the Tsugaru and Soya/La Perouse Straits. The similarity between MME results enhances the reliability of the present MME estimation.
Can single empirical algorithms accurately predict inland shallow water quality status from high resolution, multi-sensor, multi-temporal satellite data?

NASA Astrophysics Data System (ADS)

Theologou, I.; Patelaki, M.; Karantzalos, K.

2015-04-01

Assessing and monitoring water quality status through timely, cost effective and accurate manner is of fundamental importance for numerous environmental management and policy making purposes. Therefore, there is a current need for validated methodologies which can effectively exploit, in an unsupervised way, the enormous amount of earth observation imaging datasets from various high-resolution satellite multispectral sensors. To this end, many research efforts are based on building concrete relationships and empirical algorithms from concurrent satellite and in-situ data collection campaigns. We have experimented with Landsat 7 and Landsat 8 multi-temporal satellite data, coupled with hyperspectral data from a field spectroradiometer and in-situ ground truth data with several physico-chemical and other key monitoring indicators. All available datasets, covering a 4 years period, in our case study Lake Karla in Greece, were processed and fused under a quantitative evaluation framework. The performed comprehensive analysis posed certain questions regarding the applicability of single empirical models across multi-temporal, multi-sensor datasets towards the accurate prediction of key water quality indicators for shallow inland systems. Single linear regression models didn't establish concrete relations across multi-temporal, multi-sensor observations. Moreover, the shallower parts of the inland system followed, in accordance with the literature, different regression patterns. Landsat 7 and 8 resulted in quite promising results indicating that from the recreation of the lake and onward consistent per-sensor, per-depth prediction models can be successfully established. The highest rates were for chl-a (r2=89.80%), dissolved oxygen (r2=88.53%), conductivity (r2=88.18%), ammonium (r2=87.2%) and pH (r2=86.35%), while the total phosphorus (r2=70.55%) and nitrates (r2=55.50%) resulted in lower correlation rates.
Fuzzy regression modeling for tool performance prediction and degradation detection.

PubMed

Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L

2010-10-01

In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.

Simplified large African carnivore density estimators from track indices.

PubMed

Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J

2016-01-01

The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y = αx + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.
Estimating severity of sideways fall using a generic multi linear regression model based on kinematic input variables.

PubMed

van der Zijden, A M; Groen, B E; Tanck, E; Nienhuis, B; Verdonschot, N; Weerdesteyn, V

2017-03-21

Many research groups have studied fall impact mechanics to understand how fall severity can be reduced to prevent hip fractures. Yet, direct impact force measurements with force plates are restricted to a very limited repertoire of experimental falls. The purpose of this study was to develop a generic model for estimating hip impact forces (i.e. fall severity) in in vivo sideways falls without the use of force plates. Twelve experienced judokas performed sideways Martial Arts (MA) and Block ('natural') falls on a force plate, both with and without a mat on top. Data were analyzed to determine the hip impact force and to derive 11 selected (subject-specific and kinematic) variables. Falls from kneeling height were used to perform a stepwise regression procedure to assess the effects of these input variables and build the model. The final model includes four input variables, involving one subject-specific measure and three kinematic variables: maximum upper body deceleration, body mass, shoulder angle at the instant of 'maximum impact' and maximum hip deceleration. The results showed that estimated and measured hip impact forces were linearly related (explained variances ranging from 46 to 63%). Hip impact forces of MA falls onto the mat from a standing position (3650±916N) estimated by the final model were comparable with measured values (3698±689N), even though these data were not used for training the model. In conclusion, a generic linear regression model was developed that enables the assessment of fall severity through kinematic measures of sideways falls, without using force plates. Copyright © 2017 Elsevier Ltd. All rights reserved.
The microcomputer scientific software series 2: general linear model--regression.

Treesearch

Harold M. Rauscher

1983-01-01

The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Multi-linear regression models predict the effects of water chemistry on acute lead toxicity to Ceriodaphnia dubia and Pimephales promelas.

PubMed

Esbaugh, A J; Brix, K V; Mager, E M; Grosell, M

2011-09-01

The current study examined the acute toxicity of lead (Pb) to Ceriodaphnia dubia and Pimephales promelas in a variety of natural waters. The natural waters were selected to range in pertinent water chemistry parameters such as calcium, pH, total CO(2) and dissolved organic carbon (DOC). Acute toxicity was determined for C. dubia and P. promelas using standard 48h and 96h protocols, respectively. For both organisms acute toxicity varied markedly according to water chemistry, with C. dubia LC50s ranging from 29 to 180μg/L and P. promelas LC50s ranging from 41 to 3598μg/L. Additionally, no Pb toxicity was observed for P. promelas in three alkaline natural waters. With respect to water chemistry parameters, DOC had the strongest protective impact for both organisms. A multi-linear regression (MLR) approach combining previous lab data and the current data was used to identify the relative importance of individual water chemistry components in predicting acute Pb toxicity for both species. As anticipated, the P. promelas best-fit MLR model combined DOC, calcium and pH. Unexpectedly, in the C. dubiaMLR model the importance of pH, TCO(2) and calcium was minimal while DOC and ionic strength were the controlling water quality variables. Adjusted R(2) values of 0.82 and 0.64 for the P. promelas and C. dubia models, respectively, are comparable to previously developed biotic ligand models for other metals. Copyright © 2011 Elsevier Inc. All rights reserved.
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment

ERIC Educational Resources Information Center

Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos

2013-01-01

In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
Evaluation of force-velocity and power-velocity relationship of arm muscles.

PubMed

Sreckovic, Sreten; Cuk, Ivan; Djuric, Sasa; Nedeljkovic, Aleksandar; Mirkov, Dragan; Jaric, Slobodan

2015-08-01

A number of recent studies have revealed an approximately linear force-velocity (F-V) and, consequently, a parabolic power-velocity (P-V) relationship of multi-joint tasks. However, the measurement characteristics of their parameters have been neglected, particularly those regarding arm muscles, which could be a problem for using the linear F-V model in both research and routine testing. Therefore, the aims of the present study were to evaluate the strength, shape, reliability, and concurrent validity of the F-V relationship of arm muscles. Twelve healthy participants performed maximum bench press throws against loads ranging from 20 to 70 % of their maximum strength, and linear regression model was applied on the obtained range of F and V data. One-repetition maximum bench press and medicine ball throw tests were also conducted. The observed individual F-V relationships were exceptionally strong (r = 0.96-0.99; all P < 0.05) and fairly linear, although it remains unresolved whether a polynomial fit could provide even stronger relationships. The reliability of parameters obtained from the linear F-V regressions proved to be mainly high (ICC > 0.80), while their concurrent validity regarding directly measured F, P, and V ranged from high (for maximum F) to medium-to-low (for maximum P and V). The findings add to the evidence that the linear F-V and, consequently, parabolic P-V models could be used to study the mechanical properties of muscular systems, as well as to design a relatively simple, reliable, and ecologically valid routine test of the muscle ability of force, power, and velocity production.
Comparing Different Approaches for Mapping Urban Vegetation Cover from Landsat ETM+ Data: A Case Study on Brussels

PubMed Central

Van de Voorde, Tim; Vlaeminck, Jeroen; Canters, Frank

2008-01-01

Urban growth and its related environmental problems call for sustainable urban management policies to safeguard the quality of urban environments. Vegetation plays an important part in this as it provides ecological, social, health and economic benefits to a city's inhabitants. Remotely sensed data are of great value to monitor urban green and despite the clear advantages of contemporary high resolution images, the benefits of medium resolution data should not be discarded. The objective of this research was to estimate fractional vegetation cover from a Landsat ETM+ image with sub-pixel classification, and to compare accuracies obtained with multiple stepwise regression analysis, linear spectral unmixing and multi-layer perceptrons (MLP) at the level of meaningful urban spatial entities. Despite the small, but nevertheless statistically significant differences at pixel level between the alternative approaches, the spatial pattern of vegetation cover and estimation errors is clearly distinctive at neighbourhood level. At this spatially aggregated level, a simple regression model appears to attain sufficient accuracy. For mapping at a spatially more detailed level, the MLP seems to be the most appropriate choice. Brightness normalisation only appeared to affect the linear models, especially the linear spectral unmixing. PMID:27879914
Interpretation of commonly used statistical regression models.

PubMed

Kasza, Jessica; Wolfe, Rory

2014-01-01

A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Temporal Drivers of Liking Based on Functional Data Analysis and Non-Additive Models for Multi-Attribute Time-Intensity Data of Fruit Chews.

PubMed

Kuesten, Carla; Bi, Jian

2018-06-03

Conventional drivers of liking analysis was extended with a time dimension into temporal drivers of liking (TDOL) based on functional data analysis methodology and non-additive models for multiple-attribute time-intensity (MATI) data. The non-additive models, which consider both direct effects and interaction effects of attributes to consumer overall liking, include Choquet integral and fuzzy measure in the multi-criteria decision-making, and linear regression based on variance decomposition. Dynamics of TDOL, i.e., the derivatives of the relative importance functional curves were also explored. Well-established R packages 'fda', 'kappalab' and 'relaimpo' were used in the paper for developing TDOL. Applied use of these methods shows that the relative importance of MATI curves offers insights for understanding the temporal aspects of consumer liking for fruit chews.
OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

PubMed

Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

2012-01-01

The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

PubMed Central

Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

2014-01-01

The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844
A novel method for determination of aragonite saturation state on the continental shelf of central Oregon using multi-parameter relationships with hydrographic data

NASA Astrophysics Data System (ADS)

Juranek, L. W.; Feely, R. A.; Peterson, W. T.; Alin, S. R.; Hales, B.; Lee, K.; Sabine, C. L.; Peterson, J.

2009-12-01

We developed a multiple linear regression model to robustly determine aragonite saturation state (Ωarag) from observations of temperature and oxygen (R2 = 0.987, RMS error 0.053), using data collected in the Pacific Northwest region in late May 2007. The seasonal evolution of Ωarag near central Oregon was evaluated by applying the regression model to a monthly (winter)/bi-weekly (summer) water-column hydrographic time-series collected over the shelf and slope in 2007. The Ωarag predicted by the regression model was less than 1, the thermodynamic calcification/dissolution threshold, over shelf/slope bottom waters throughout the entire 2007 upwelling season (May-November), with the Ωarag = 1 horizon shoaling to 30 m by late summer. The persistence of water with Ωarag < 1 on the continental shelf has not been previously noted and could have notable ecological consequences for benthic and pelagic calcifying organisms such as mussels, oysters, abalone, echinoderms, and pteropods.
An Expert System for the Evaluation of Cost Models

DTIC Science & Technology

1990-09-01

contrast to the condition of equal error variance, called homoscedasticity. (Reference: Applied Linear Regression Models by John Neter - page 423...normal. (Reference: Applied Linear Regression Models by John Neter - page 125) Click Here to continue -> Autocorrelation Click Here for the index - Index...over time. Error terms correlated over time are said to be autocorrelated or serially correlated. (REFERENCE: Applied Linear Regression Models by John
Simulation of groundwater level variations using wavelet combined with neural network, linear regression and support vector machine

NASA Astrophysics Data System (ADS)

Ebrahimi, Hadi; Rajaee, Taher

2017-01-01

Simulation of groundwater level (GWL) fluctuations is an important task in management of groundwater resources. In this study, the effect of wavelet analysis on the training of the artificial neural network (ANN), multi linear regression (MLR) and support vector regression (SVR) approaches was investigated, and the ANN, MLR and SVR along with the wavelet-ANN (WNN), wavelet-MLR (WLR) and wavelet-SVR (WSVR) models were compared in simulating one-month-ahead of GWL. The only variable used to develop the models was the monthly GWL data recorded over a period of 11 years from two wells in the Qom plain, Iran. The results showed that decomposing GWL time series into several sub-time series, extremely improved the training of the models. For both wells 1 and 2, the Meyer and Db5 wavelets produced better results compared to the other wavelets; which indicated wavelet types had similar behavior in similar case studies. The optimal number of delays was 6 months, which seems to be due to natural phenomena. The best WNN model, using Meyer mother wavelet with two decomposition levels, simulated one-month-ahead with RMSE values being equal to 0.069 m and 0.154 m for wells 1 and 2, respectively. The RMSE values for the WLR model were 0.058 m and 0.111 m, and for WSVR model were 0.136 m and 0.060 m for wells 1 and 2, respectively.
A Constrained Linear Estimator for Multiple Regression

ERIC Educational Resources Information Center

Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.

2010-01-01

"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
Estimating linear temporal trends from aggregated environmental monitoring data

USGS Publications Warehouse

Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.

2017-01-01

Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.
3D Multi-segment foot kinematics in children: A developmental study in typically developing boys.

PubMed

Deschamps, Kevin; Staes, Filip; Peerlinck, Kathelijne; Van Geet, Christel; Hermans, Cedric; Matricali, Giovanni Arnoldo; Lobet, Sebastien

2017-02-01

The relationship between age and 3D rotations objectivized with multisegment foot models has not been quantified until now. The purpose of this study was therefore to investigate the relationship between age and multi-segment foot kinematics in a cross-sectional database. Barefoot multi-segment foot kinematics of thirty two typically developing boys, aged 6-20 years, were captured with the Rizzoli Multi-segment Foot Model. One-dimensional statistical parametric mapping linear regression was used to examine the relationship between age and 3D inter-segment rotations of the dominant leg during the full gait cycle. Age was significantly correlated with sagittal plane kinematics of the midfoot and the calcaneus-metatarsus inter-segment angle (p<0.0125). Age was also correlated with the transverse plane kinematics of the calcaneus-metatarsus angle (p<0.0001). Gait labs should consider age related differences and variability if optimal decision making is pursued. It remains unclear if this is of interest for all foot models, however, the current study highlights that this is of particular relevance for foot models which incorporate a separate midfoot segment. Copyright © 2016 Elsevier B.V. All rights reserved.
Unitary Response Regression Models

ERIC Educational Resources Information Center

Lipovetsky, S.

2007-01-01

The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
New Approach To Hour-By-Hour Weather Forecast

NASA Astrophysics Data System (ADS)

Liao, Q. Q.; Wang, B.

2017-12-01

Fine hourly forecast in single station weather forecast is required in many human production and life application situations. Most previous MOS (Model Output Statistics) which used a linear regression model are hard to solve nonlinear natures of the weather prediction and forecast accuracy has not been sufficient at high temporal resolution. This study is to predict the future meteorological elements including temperature, precipitation, relative humidity and wind speed in a local region over a relatively short period of time at hourly level. By means of hour-to-hour NWP (Numeral Weather Prediction)meteorological field from Forcastio (https://darksky.net/dev/docs/forecast) and real-time instrumental observation including 29 stations in Yunnan and 3 stations in Tianjin of China from June to October 2016, predictions are made of the 24-hour hour-by-hour ahead. This study presents an ensemble approach to combine the information of instrumental observation itself and NWP. Use autoregressive-moving-average (ARMA) model to predict future values of the observation time series. Put newest NWP products into the equations derived from the multiple linear regression MOS technique. Handle residual series of MOS outputs with autoregressive (AR) model for the linear property presented in time series. Due to the complexity of non-linear property of atmospheric flow, support vector machine (SVM) is also introduced . Therefore basic data quality control and cross validation makes it able to optimize the model function parameters , and do 24 hours ahead residual reduction with AR/SVM model. Results show that AR model technique is better than corresponding multi-variant MOS regression method especially at the early 4 hours when the predictor is temperature. MOS-AR combined model which is comparable to MOS-SVM model outperform than MOS. Both of their root mean square error and correlation coefficients for 2 m temperature are reduced to 1.6 degree Celsius and 0.91 respectively. The forecast accuracy of 24- hour forecast deviation no more than 2 degree Celsius is 78.75 % for MOS-AR model and 81.23 % for AR model.
Bayesian Travel Time Inversion adopting Gaussian Process Regression

NASA Astrophysics Data System (ADS)

Mauerberger, S.; Holschneider, M.

2017-12-01

A major application in seismology is the determination of seismic velocity models. Travel time measurements are putting an integral constraint on the velocity between source and receiver. We provide insight into travel time inversion from a correlation-based Bayesian point of view. Therefore, the concept of Gaussian process regression is adopted to estimate a velocity model. The non-linear travel time integral is approximated by a 1st order Taylor expansion. A heuristic covariance describes correlations amongst observations and a priori model. That approach enables us to assess a proxy of the Bayesian posterior distribution at ordinary computational costs. No multi dimensional numeric integration nor excessive sampling is necessary. Instead of stacking the data, we suggest to progressively build the posterior distribution. Incorporating only a single evidence at a time accounts for the deficit of linearization. As a result, the most probable model is given by the posterior mean whereas uncertainties are described by the posterior covariance.As a proof of concept, a synthetic purely 1d model is addressed. Therefore a single source accompanied by multiple receivers is considered on top of a model comprising a discontinuity. We consider travel times of both phases - direct and reflected wave - corrupted by noise. Left and right of the interface are assumed independent where the squared exponential kernel serves as covariance.

Gas demand forecasting by a new artificial intelligent algorithm

NASA Astrophysics Data System (ADS)

Khatibi. B, Vahid; Khatibi, Elham

2012-01-01

Energy demand forecasting is a key issue for consumers and generators in all energy markets in the world. This paper presents a new forecasting algorithm for daily gas demand prediction. This algorithm combines a wavelet transform and forecasting models such as multi-layer perceptron (MLP), linear regression or GARCH. The proposed method is applied to real data from the UK gas markets to evaluate their performance. The results show that the forecasting accuracy is improved significantly by using the proposed method.
Wavelet-linear genetic programming: A new approach for modeling monthly streamflow

NASA Astrophysics Data System (ADS)

Ravansalar, Masoud; Rajaee, Taher; Kisi, Ozgur

2017-06-01

The streamflows are important and effective factors in stream ecosystems and its accurate prediction is an essential and important issue in water resources and environmental engineering systems. A hybrid wavelet-linear genetic programming (WLGP) model, which includes a discrete wavelet transform (DWT) and a linear genetic programming (LGP) to predict the monthly streamflow (Q) in two gauging stations, Pataveh and Shahmokhtar, on the Beshar River at the Yasuj, Iran were used in this study. In the proposed WLGP model, the wavelet analysis was linked to the LGP model where the original time series of streamflow were decomposed into the sub-time series comprising wavelet coefficients. The results were compared with the single LGP, artificial neural network (ANN), a hybrid wavelet-ANN (WANN) and Multi Linear Regression (MLR) models. The comparisons were done by some of the commonly utilized relevant physical statistics. The Nash coefficients (E) were found as 0.877 and 0.817 for the WLGP model, for the Pataveh and Shahmokhtar stations, respectively. The comparison of the results showed that the WLGP model could significantly increase the streamflow prediction accuracy in both stations. Since, the results demonstrate a closer approximation of the peak streamflow values by the WLGP model, this model could be utilized for the simulation of cumulative streamflow data prediction in one month ahead.
A comparison of two adaptive multivariate analysis methods (PLSR and ANN) for winter wheat yield forecasting using Landsat-8 OLI images

NASA Astrophysics Data System (ADS)

Chen, Pengfei; Jing, Qi

2017-02-01

An assumption that the non-linear method is more reasonable than the linear method when canopy reflectance is used to establish the yield prediction model was proposed and tested in this study. For this purpose, partial least squares regression (PLSR) and artificial neural networks (ANN), represented linear and non-linear analysis method, were applied and compared for wheat yield prediction. Multi-period Landsat-8 OLI images were collected at two different wheat growth stages, and a field campaign was conducted to obtain grain yields at selected sampling sites in 2014. The field data were divided into a calibration database and a testing database. Using calibration data, a cross-validation concept was introduced for the PLSR and ANN model construction to prevent over-fitting. All models were tested using the test data. The ANN yield-prediction model produced R2, RMSE and RMSE% values of 0.61, 979 kg ha-1, and 10.38%, respectively, in the testing phase, performing better than the PLSR yield-prediction model, which produced R2, RMSE, and RMSE% values of 0.39, 1211 kg ha-1, and 12.84%, respectively. Non-linear method was suggested as a better method for yield prediction.
Multi-ethnic analysis reveals soluble L-selectin may be post-transcriptionally regulated by 3′UTR polymorphism: the Multi-Ethnic Study of Atherosclerosis (MESA)

PubMed Central

Berardi, Cecilia; Larson, Nicholas B.; Decker, Paul A.; Wassel, Christina L.; Kirsch, Phillip S.; Pankow, James S.; Sale, Michele M.; de Andrade, Mariza; Sicotte, Hugues; Tang, Weihong; Hanson, Naomi Q.; Tsai, Michael Y.; da Chen, Yii-Der I; Bielinski, Suzette J.

2015-01-01

L-selectin is constitutively expressed on leukocytes and mediates their interaction with endothelial cells during inflammation. Previous studies on the association of soluble L-selectin (sL-selectin) with cardiovascular disease (CVD) are inconsistent. Genetic variants associated with sL-selectin levels may be a better surrogate of levels over a lifetime. We explored the association of genetic variants and sL-selectin levels in a race/ethnicity stratified random sample of 2,403 participants in the Multi-Ethnic Study of Atherosclerosis (MESA). Through a genome-wide analysis with additive linear regression models, we found that rs12938 on the SELL gene accounted for a significant portion of the protein level variance across all four races/ethnicities. To evaluate potential additional associations, elastic net models were used for variants located in the SELL/SELP/SELE genetic region and an additional two SNPs, rs3917768 and rs4987361, were associated with sL-selectin levels in African Americans. These variants accounted for a portion of protein variance that ranged from 4% in Hispanic to 14% in African Americans. To investigate the relationship of these variants with CVD, 6,317 subjects were used. No significant association was found between any of the identified SNPs and carotid intima-media thickness or presence of carotid plaque using linear and logistic regression, respectively. Similarly no significant results were found for coronary artery calcium or coronary heart disease events. In conclusion, we found that variants within the SELL gene are associated with sL-selectin levels. Despite accounting for a significant portion of the protein level variance, none of the variants was associated with clinical or subclinical CVD. PMID:25576479
Accounting for autocorrelation in multi-drug resistant tuberculosis predictors using a set of parsimonious orthogonal eigenvectors aggregated in geographic space.

PubMed

Jacob, Benjamin J; Krapp, Fiorella; Ponce, Mario; Gottuzzo, Eduardo; Griffith, Daniel A; Novak, Robert J

2010-05-01

Spatial autocorrelation is problematic for classical hierarchical cluster detection tests commonly used in multi-drug resistant tuberculosis (MDR-TB) analyses as considerable random error can occur. Therefore, when MDRTB clusters are spatially autocorrelated the assumption that the clusters are independently random is invalid. In this research, a product moment correlation coefficient (i.e., the Moran's coefficient) was used to quantify local spatial variation in multiple clinical and environmental predictor variables sampled in San Juan de Lurigancho, Lima, Peru. Initially, QuickBird 0.61 m data, encompassing visible bands and the near infra-red bands, were selected to synthesize images of land cover attributes of the study site. Data of residential addresses of individual patients with smear-positive MDR-TB were geocoded, prevalence rates calculated and then digitally overlaid onto the satellite data within a 2 km buffer of 31 georeferenced health centers, using a 10 m2 grid-based algorithm. Geographical information system (GIS)-gridded measurements of each health center were generated based on preliminary base maps of the georeferenced data aggregated to block groups and census tracts within each buffered area. A three-dimensional model of the study site was constructed based on a digital elevation model (DEM) to determine terrain covariates associated with the sampled MDR-TB covariates. Pearson's correlation was used to evaluate the linear relationship between the DEM and the sampled MDR-TB data. A SAS/GIS(R) module was then used to calculate univariate statistics and to perform linear and non-linear regression analyses using the sampled predictor variables. The estimates generated from a global autocorrelation analyses were then spatially decomposed into empirical orthogonal bases using a negative binomial regression with a non-homogeneous mean. Results of the DEM analyses indicated a statistically non-significant, linear relationship between georeferenced health centers and the sampled covariate elevation. The data exhibited positive spatial autocorrelation and the decomposition of Moran's coefficient into uncorrelated, orthogonal map pattern components revealed global spatial heterogeneities necessary to capture latent autocorrelation in the MDR-TB model. It was thus shown that Poisson regression analyses and spatial eigenvector mapping can elucidate the mechanics of MDR-TB transmission by prioritizing clinical and environmental-sampled predictor variables for identifying high risk populations.
Development of Ensemble Model Based Water Demand Forecasting Model

NASA Astrophysics Data System (ADS)

Kwon, Hyun-Han; So, Byung-Jin; Kim, Seong-Hyeon; Kim, Byung-Seop

2014-05-01

In recent years, Smart Water Grid (SWG) concept has globally emerged over the last decade and also gained significant recognition in South Korea. Especially, there has been growing interest in water demand forecast and optimal pump operation and this has led to various studies regarding energy saving and improvement of water supply reliability. Existing water demand forecasting models are categorized into two groups in view of modeling and predicting their behavior in time series. One is to consider embedded patterns such as seasonality, periodicity and trends, and the other one is an autoregressive model that is using short memory Markovian processes (Emmanuel et al., 2012). The main disadvantage of the abovementioned model is that there is a limit to predictability of water demands of about sub-daily scale because the system is nonlinear. In this regard, this study aims to develop a nonlinear ensemble model for hourly water demand forecasting which allow us to estimate uncertainties across different model classes. The proposed model is consist of two parts. One is a multi-model scheme that is based on combination of independent prediction model. The other one is a cross validation scheme named Bagging approach introduced by Brieman (1996) to derive weighting factors corresponding to individual models. Individual forecasting models that used in this study are linear regression analysis model, polynomial regression, multivariate adaptive regression splines(MARS), SVM(support vector machine). The concepts are demonstrated through application to observed from water plant at several locations in the South Korea. Keywords: water demand, non-linear model, the ensemble forecasting model, uncertainty. Acknowledgements This subject is supported by Korea Ministry of Environment as "Projects for Developing Eco-Innovation Technologies (GT-11-G-02-001-6)
[From clinical judgment to linear regression model.

PubMed

Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

2013-01-01

When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Advanced statistics: linear regression, part II: multiple linear regression.

PubMed

Marill, Keith A

2004-01-01

The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Covariate Selection for Multilevel Models with Missing Data

PubMed Central

Marino, Miguel; Buxton, Orfeu M.; Li, Yi

2017-01-01

Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population. PMID:28239457
Sensitivity of Chemical Shift-Encoded Fat Quantification to Calibration of Fat MR Spectrum

PubMed Central

Wang, Xiaoke; Hernando, Diego; Reeder, Scott B.

2015-01-01

Purpose To evaluate the impact of different fat spectral models on proton density fat-fraction (PDFF) quantification using chemical shift-encoded (CSE) MRI. Material and Methods Simulations and in vivo imaging were performed. In a simulation study, spectral models of fat were compared pairwise. Comparison of magnitude fitting and mixed fitting was performed over a range of echo times and fat fractions. In vivo acquisitions from 41 patients were reconstructed using 7 published spectral models of fat. T2-corrected STEAM-MRS was used as reference. Results Simulations demonstrate that imperfectly calibrated spectral models of fat result in biases that depend on echo times and fat fraction. Mixed fitting is more robust against this bias than magnitude fitting. Multi-peak spectral models showed much smaller differences among themselves than when compared to the single-peak spectral model. In vivo studies show all multi-peak models agree better (for mixed fitting, slope ranged from 0.967–1.045 using linear regression) with reference standard than the single-peak model (for mixed fitting, slope=0.76). Conclusion It is essential to use a multi-peak fat model for accurate quantification of fat with CSE-MRI. Further, fat quantification techniques using multi-peak fat models are comparable and no specific choice of spectral model is shown to be superior to the rest. PMID:25845713
An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

ERIC Educational Resources Information Center

Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

2011-01-01

This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Multi-disease analysis of maternal antibody decay using non-linear mixed models accounting for censoring.

PubMed

Goeyvaerts, Nele; Leuridan, Elke; Faes, Christel; Van Damme, Pierre; Hens, Niel

2015-09-10

Biomedical studies often generate repeated measures of multiple outcomes on a set of subjects. It may be of interest to develop a biologically intuitive model for the joint evolution of these outcomes while assessing inter-subject heterogeneity. Even though it is common for biological processes to entail non-linear relationships, examples of multivariate non-linear mixed models (MNMMs) are still fairly rare. We contribute to this area by jointly analyzing the maternal antibody decay for measles, mumps, rubella, and varicella, allowing for a different non-linear decay model for each infectious disease. We present a general modeling framework to analyze multivariate non-linear longitudinal profiles subject to censoring, by combining multivariate random effects, non-linear growth and Tobit regression. We explore the hypothesis of a common infant-specific mechanism underlying maternal immunity using a pairwise correlated random-effects approach and evaluating different correlation matrix structures. The implied marginal correlation between maternal antibody levels is estimated using simulations. The mean duration of passive immunity was less than 4 months for all diseases with substantial heterogeneity between infants. The maternal antibody levels against rubella and varicella were found to be positively correlated, while little to no correlation could be inferred for the other disease pairs. For some pairs, computational issues occurred with increasing correlation matrix complexity, which underlines the importance of further developing estimation methods for MNMMs. Copyright © 2015 John Wiley & Sons, Ltd.
Factors associated with single-vehicle and multi-vehicle road traffic collision injuries in Ireland.

PubMed

Donnelly-Swift, Erica; Kelly, Alan

2016-12-01

Generalised linear regression models were used to identify factors associated with fatal/serious road traffic collision injuries for single- and multi-vehicle collisions. Single-vehicle collisions and multi-vehicle collisions occurring during the hours of darkness or on a wet road surface had reduced likelihood of a fatal/serious injury. Single-vehicle 'driver with passengers' collisions occurring at junctions or on a hill/gradient were less likely to result in a fatal/serious injury. Multi-vehicle rear-end/angle collisions had reduced likelihood of a fatal/serious injury. Single-vehicle 'driver only' collisions and multi-vehicle collisions occurring on a public/bank holiday or on a hill/gradient were more likely to result in a fatal/serious injury. Single-vehicle collisions involving male drivers had increased likelihood of a fatal/serious injury and single-vehicle 'driver with passengers' collisions involving drivers under the age of 25 years also had increased likelihood of a fatal/serious injury. Findings can enlighten decision-makers to circumstances leading to fatal/serious injuries.
Estimating PM2.5 Concentrations in Xi'an City Using a Generalized Additive Model with Multi-Source Monitoring Data

PubMed Central

Song, Yong-Ze; Yang, Hong-Lei; Peng, Jun-Huan; Song, Yi-Rong; Sun, Qian; Li, Yuan

2015-01-01

Particulate matter with an aerodynamic diameter <2.5 μm (PM2.5) represents a severe environmental problem and is of negative impact on human health. Xi'an City, with a population of 6.5 million, is among the highest concentrations of PM2.5 in China. In 2013, in total, there were 191 days in Xi’an City on which PM2.5 concentrations were greater than 100 μg/m3. Recently, a few studies have explored the potential causes of high PM2.5 concentration using remote sensing data such as the MODIS aerosol optical thickness (AOT) product. Linear regression is a commonly used method to find statistical relationships among PM2.5 concentrations and other pollutants, including CO, NO2, SO2, and O3, which can be indicative of emission sources. The relationships of these variables, however, are usually complicated and non-linear. Therefore, a generalized additive model (GAM) is used to estimate the statistical relationships between potential variables and PM2.5 concentrations. This model contains linear functions of SO2 and CO, univariate smoothing non-linear functions of NO2, O3, AOT and temperature, and bivariate smoothing non-linear functions of location and wind variables. The model can explain 69.50% of PM2.5 concentrations, with R2 = 0.691, which improves the result of a stepwise linear regression (R2 = 0.582) by 18.73%. The two most significant variables, CO concentration and AOT, represent 20.65% and 19.54% of the deviance, respectively, while the three other gas-phase concentrations, SO2, NO2, and O3 account for 10.88% of the total deviance. These results show that in Xi'an City, the traffic and other industrial emissions are the primary source of PM2.5. Temperature, location, and wind variables also non-linearly related with PM2.5. PMID:26540446
Comparative data mining analysis for information retrieval of MODIS images: monitoring lake turbidity changes at Lake Okeechobee, Florida

NASA Astrophysics Data System (ADS)

Chang, Ni-Bin; Daranpob, Ammarin; Yang, Y. Jeffrey; Jin, Kang-Ren

2009-09-01

In the remote sensing field, a frequently recurring question is: Which computational intelligence or data mining algorithms are most suitable for the retrieval of essential information given that most natural systems exhibit very high non-linearity. Among potential candidates might be empirical regression, neural network model, support vector machine, genetic algorithm/genetic programming, analytical equation, etc. This paper compares three types of data mining techniques, including multiple non-linear regression, artificial neural networks, and genetic programming, for estimating multi-temporal turbidity changes following hurricane events at Lake Okeechobee, Florida. This retrospective analysis aims to identify how the major hurricanes impacted the water quality management in 2003-2004. The Moderate Resolution Imaging Spectroradiometer (MODIS) Terra 8-day composite imageries were used to retrieve the spatial patterns of turbidity distributions for comparison against the visual patterns discernible in the in-situ observations. By evaluating four statistical parameters, the genetic programming model was finally selected as the most suitable data mining tool for classification in which the MODIS band 1 image and wind speed were recognized as the major determinants by the model. The multi-temporal turbidity maps generated before and after the major hurricane events in 2003-2004 showed that turbidity levels were substantially higher after hurricane episodes. The spatial patterns of turbidity confirm that sediment-laden water travels to the shore where it reduces the intensity of the light necessary to submerged plants for photosynthesis. This reduction results in substantial loss of biomass during the post-hurricane period.
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.

PubMed

Ferrari, Alberto; Comelli, Mario

2016-12-01

In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.
Comparing The Effectiveness of a90/95 Calculations (Preprint)

DTIC Science & Technology

2006-09-01

Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

DTIC Science & Technology

2015-07-15

Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors
A comparison of radiometric correction techniques in the evaluation of the relationship between LST and NDVI in Landsat imagery.

PubMed

Tan, Kok Chooi; Lim, Hwee San; Matjafri, Mohd Zubir; Abdullah, Khiruddin

2012-06-01

Atmospheric corrections for multi-temporal optical satellite images are necessary, especially in change detection analyses, such as normalized difference vegetation index (NDVI) rationing. Abrupt change detection analysis using remote-sensing techniques requires radiometric congruity and atmospheric correction to monitor terrestrial surfaces over time. Two atmospheric correction methods were used for this study: relative radiometric normalization and the simplified method for atmospheric correction (SMAC) in the solar spectrum. A multi-temporal data set consisting of two sets of Landsat images from the period between 1991 and 2002 of Penang Island, Malaysia, was used to compare NDVI maps, which were generated using the proposed atmospheric correction methods. Land surface temperature (LST) was retrieved using ATCOR3_T in PCI Geomatica 10.1 image processing software. Linear regression analysis was utilized to analyze the relationship between NDVI and LST. This study reveals that both of the proposed atmospheric correction methods yielded high accuracy through examination of the linear correlation coefficients. To check for the accuracy of the equation obtained through linear regression analysis for every single satellite image, 20 points were randomly chosen. The results showed that the SMAC method yielded a constant value (in terms of error) to predict the NDVI value from linear regression analysis-derived equation. The errors (average) from both proposed atmospheric correction methods were less than 10%.
Analyzing Multilevel Data: An Empirical Comparison of Parameter Estimates of Hierarchical Linear Modeling and Ordinary Least Squares Regression

ERIC Educational Resources Information Center

Rocconi, Louis M.

2011-01-01

Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…

A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION

EPA Science Inventory

We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
A new approach to correct the QT interval for changes in heart rate using a nonparametric regression model in beagle dogs.

PubMed

Watanabe, Hiroyuki; Miyazaki, Hiroyasu

2006-01-01

Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
A simple linear regression method for quantitative trait loci linkage analysis with censored observations.

PubMed

Anderson, Carl A; McRae, Allan F; Visscher, Peter M

2006-07-01

Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Guidelines and Procedures for Computing Time-Series Suspended-Sediment Concentrations and Loads from In-Stream Turbidity-Sensor and Streamflow Data

USGS Publications Warehouse

Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.

2009-01-01

In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.
Comparison of conceptually based and regression rainfall-runoff models, Denver Metropolitan area, Colorado, and potential applications in urban areas

USGS Publications Warehouse

Lindner-Lunsford, J. B.; Ellis, S.R.

1987-01-01

Multievent, conceptually based models and a single-event, multiple linear-regression model for estimating storm-runoff quantity and quality from urban areas were calibrated and verified for four small (57 to 167 acres) basins in the Denver metropolitan area, Colorado. The basins represented different land-use types - light commercial, single-family housing, and multi-family housing. Both types of models were calibrated using the same data set for each basin. A comparison was made between the storm-runoff volume, peak flow, and storm-runoff loads of seven water quality constituents simulated by each of the models by use of identical verification data sets. The models studied were the U.S. Geological Survey 's Distributed Routing Rainfall-Runoff Model-Version II (DR3M-II) (a runoff-quantity model designed for urban areas), and a multievent urban runoff quality model (DR3M-QUAL). Water quality constituents modeled were chemical oxygen demand, total suspended solids, total nitrogen, total phosphorus, total lead, total manganese, and total zinc. (USGS)
Computation of nonlinear least squares estimator and maximum likelihood using principles in matrix calculus

NASA Astrophysics Data System (ADS)

Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.

2017-11-01

This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
[Application of artificial neural networks on the prediction of surface ozone concentrations].

PubMed

Shen, Lu-Lu; Wang, Yu-Xuan; Duan, Lei

2011-08-01

Ozone is an important secondary air pollutant in the lower atmosphere. In order to predict the hourly maximum ozone one day in advance based on the meteorological variables for the Wanqingsha site in Guangzhou, Guangdong province, a neural network model (Multi-Layer Perceptron) and a multiple linear regression model were used and compared. Model inputs are meteorological parameters (wind speed, wind direction, air temperature, relative humidity, barometric pressure and solar radiation) of the next day and hourly maximum ozone concentration of the previous day. The OBS (optimal brain surgeon) was adopted to prune the neutral work, to reduce its complexity and to improve its generalization ability. We find that the pruned neural network has the capacity to predict the peak ozone, with an agreement index of 92.3%, the root mean square error of 0.0428 mg/m3, the R-square of 0.737 and the success index of threshold exceedance 77.0% (the threshold O3 mixing ratio of 0.20 mg/m3). When the neural classifier was added to the neural network model, the success index of threshold exceedance increased to 83.6%. Through comparison of the performance indices between the multiple linear regression model and the neural network model, we conclud that that neural network is a better choice to predict peak ozone from meteorological forecast, which may be applied to practical prediction of ozone concentration.
Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

ERIC Educational Resources Information Center

Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

2006-01-01

Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

PubMed

Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

2016-01-01

Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
A multi-criteria index for ecological evaluation of tropical agriculture in southeastern Mexico.

PubMed

Huerta, Esperanza; Kampichler, Christian; Ochoa-Gaona, Susana; De Jong, Ben; Hernandez-Daumas, Salvador; Geissen, Violette

2014-01-01

The aim of this study was to generate an easy to use index to evaluate the ecological state of agricultural land from a sustainability perspective. We selected environmental indicators, such as the use of organic soil amendments (green manure) versus chemical fertilizers, plant biodiversity (including crop associations), variables which characterize soil conservation of conventional agricultural systems, pesticide use, method and frequency of tillage. We monitored the ecological state of 52 agricultural plots to test the performance of the index. The variables were hierarchically aggregated with simple mathematical algorithms, if-then rules, and rule-based fuzzy models, yielding the final multi-criteria index with values from 0 (worst) to 1 (best conditions). We validated the model through independent evaluation by experts, and we obtained a linear regression with an r2 = 0.61 (p = 2.4e-06, d.f. = 49) between index output and the experts' evaluation.
A Multi-Criteria Index for Ecological Evaluation of Tropical Agriculture in Southeastern Mexico

PubMed Central

Huerta, Esperanza; Kampichler, Christian; Ochoa-Gaona, Susana; De Jong, Ben; Hernandez-Daumas, Salvador; Geissen, Violette

2014-01-01

The aim of this study was to generate an easy to use index to evaluate the ecological state of agricultural land from a sustainability perspective. We selected environmental indicators, such as the use of organic soil amendments (green manure) versus chemical fertilizers, plant biodiversity (including crop associations), variables which characterize soil conservation of conventional agricultural systems, pesticide use, method and frequency of tillage. We monitored the ecological state of 52 agricultural plots to test the performance of the index. The variables were hierarchically aggregated with simple mathematical algorithms, if-then rules, and rule-based fuzzy models, yielding the final multi-criteria index with values from 0 (worst) to 1 (best conditions). We validated the model through independent evaluation by experts, and we obtained a linear regression with an r2 = 0.61 (p = 2.4e-06, d.f. = 49) between index output and the experts’ evaluation. PMID:25405980
Very-short-term wind power prediction by a hybrid model with single- and multi-step approaches

NASA Astrophysics Data System (ADS)

Mohammed, E.; Wang, S.; Yu, J.

2017-05-01

Very-short-term wind power prediction (VSTWPP) has played an essential role for the operation of electric power systems. This paper aims at improving and applying a hybrid method of VSTWPP based on historical data. The hybrid method is combined by multiple linear regressions and least square (MLR&LS), which is intended for reducing prediction errors. The predicted values are obtained through two sub-processes:1) transform the time-series data of actual wind power into the power ratio, and then predict the power ratio;2) use the predicted power ratio to predict the wind power. Besides, the proposed method can include two prediction approaches: single-step prediction (SSP) and multi-step prediction (MSP). WPP is tested comparatively by auto-regressive moving average (ARMA) model from the predicted values and errors. The validity of the proposed hybrid method is confirmed in terms of error analysis by using probability density function (PDF), mean absolute percent error (MAPE) and means square error (MSE). Meanwhile, comparison of the correlation coefficients between the actual values and the predicted values for different prediction times and window has confirmed that MSP approach by using the hybrid model is the most accurate while comparing to SSP approach and ARMA. The MLR&LS is accurate and promising for solving problems in WPP.
Estimation of Standard Error of Regression Effects in Latent Regression Models Using Binder's Linearization. Research Report. ETS RR-07-09

ERIC Educational Resources Information Center

Li, Deping; Oranje, Andreas

2007-01-01

Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data

NASA Technical Reports Server (NTRS)

Ulbrich, Norbert Manfred; Volden, Thomas R.

2010-01-01

The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
SAND: an automated VLBI imaging and analysing pipeline - I. Stripping component trajectories

NASA Astrophysics Data System (ADS)

Zhang, M.; Collioud, A.; Charlot, P.

2018-02-01

We present our implementation of an automated very long baseline interferometry (VLBI) data-reduction pipeline that is dedicated to interferometric data imaging and analysis. The pipeline can handle massive VLBI data efficiently, which makes it an appropriate tool to investigate multi-epoch multiband VLBI data. Compared to traditional manual data reduction, our pipeline provides more objective results as less human interference is involved. The source extraction is carried out in the image plane, while deconvolution and model fitting are performed in both the image plane and the uv plane for parallel comparison. The output from the pipeline includes catalogues of CLEANed images and reconstructed models, polarization maps, proper motion estimates, core light curves and multiband spectra. We have developed a regression STRIP algorithm to automatically detect linear or non-linear patterns in the jet component trajectories. This algorithm offers an objective method to match jet components at different epochs and to determine their proper motions.
A method for fitting regression splines with varying polynomial order in the linear mixed model.

PubMed

Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W

2006-02-15

The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Improving medium-range ensemble streamflow forecasts through statistical post-processing

NASA Astrophysics Data System (ADS)

Mendoza, Pablo; Wood, Andy; Clark, Elizabeth; Nijssen, Bart; Clark, Martyn; Ramos, Maria-Helena; Nowak, Kenneth; Arnold, Jeffrey

2017-04-01

Probabilistic hydrologic forecasts are a powerful source of information for decision-making in water resources operations. A common approach is the hydrologic model-based generation of streamflow forecast ensembles, which can be implemented to account for different sources of uncertainties - e.g., from initial hydrologic conditions (IHCs), weather forecasts, and hydrologic model structure and parameters. In practice, hydrologic ensemble forecasts typically have biases and spread errors stemming from errors in the aforementioned elements, resulting in a degradation of probabilistic properties. In this work, we compare several statistical post-processing techniques applied to medium-range ensemble streamflow forecasts obtained with the System for Hydromet Applications, Research and Prediction (SHARP). SHARP is a fully automated prediction system for the assessment and demonstration of short-term to seasonal streamflow forecasting applications, developed by the National Center for Atmospheric Research, University of Washington, U.S. Army Corps of Engineers, and U.S. Bureau of Reclamation. The suite of post-processing techniques includes linear blending, quantile mapping, extended logistic regression, quantile regression, ensemble analogs, and the generalized linear model post-processor (GLMPP). We assess and compare these techniques using multi-year hindcasts in several river basins in the western US. This presentation discusses preliminary findings about the effectiveness of the techniques for improving probabilistic skill, reliability, discrimination, sharpness and resolution.
Modeling the relationship between photosynthetically active radiation and global horizontal irradiance using singular spectrum analysis

NASA Astrophysics Data System (ADS)

Zempila, Melina-Maria; Taylor, Michael; Bais, Alkiviadis; Kazadzis, Stelios

2016-10-01

We report on the construction of generic models to calculate photosynthetically active radiation (PAR) from global horizontal irradiance (GHI), and vice versa. Our study took place at stations of the Greek UV network (UVNET) and the Hellenic solar energy network (HNSE) with measurements from NILU-UV multi-filter radiometers and CM pyranometers, chosen due to their long (≈1 M record/site) high temporal resolution (≈1 min) record that captures a broad range of atmospheric environments and cloudiness conditions. The uncertainty of the PAR measurements is quantified to be ±6.5% while the uncertainty involved in GHI measurements is up to ≈±7% according to the manufacturer. We show how multi-linear regression and nonlinear neural network (NN) models, trained at a calibration site (Thessaloniki) can be made generic provided that the input-output time series are processed with multi-channel singular spectrum analysis (M-SSA). Without M-SSA, both linear and nonlinear models perform well only locally. M-SSA with 50 time-lags is found to be sufficient for identification of trend, periodic and noise components in aerosol, cloud parameters and irradiance, and to construct regularized noise models of PAR from GHI irradiances. Reconstructed PAR and GHI time series capture ≈95% of the variance of the cross-validated target measurements and have median absolute percentage errors <2%. The intra-site median absolute error of M-SSA processed models were ≈8.2±1.7 W/m2 for PAR and ≈9.2±4.2 W/m2 for GHI. When applying the models trained at Thessaloniki to other stations, the average absolute mean bias between the model estimates and measured values was found to be ≈1.2 W/m2 for PAR and ≈0.8 W/m2 for GHI. For the models, percentage errors are well within the uncertainty of the measurements at all sites. Generic NN models were found to perform marginally better than their linear counterparts.
Multi-objective Optimization of Solar Irradiance and Variance at Pertinent Inclination Angles

NASA Astrophysics Data System (ADS)

Jain, Dhanesh; Lalwani, Mahendra

2018-05-01

The performance of photovoltaic panel gets highly affected bychange in atmospheric conditions and angle of inclination. This article evaluates the optimum tilt angle and orientation angle (surface azimuth angle) for solar photovoltaic array in order to get maximum solar irradiance and to reduce variance of radiation at different sets or subsets of time periods. Non-linear regression and adaptive neural fuzzy interference system (ANFIS) methods are used for predicting the solar radiation. The results of ANFIS are more accurate in comparison to non-linear regression. These results are further used for evaluating the correlation and applied for estimating the optimum combination of tilt angle and orientation angle with the help of general algebraic modelling system and multi-objective genetic algorithm. The hourly average solar irradiation is calculated at different combinations of tilt angle and orientation angle with the help of horizontal surface radiation data of Jodhpur (Rajasthan, India). The hourly average solar irradiance is calculated for three cases: zero variance, with actual variance and with double variance at different time scenarios. It is concluded that monthly collected solar radiation produces better result as compared to bimonthly, seasonally, half-yearly and yearly collected solar radiation. The profit obtained for monthly varying angle has 4.6% more with zero variance and 3.8% more with actual variance, than the annually fixed angle.
Searching for the main anti-bacterial components in artificial Calculus bovis using UPLC and microcalorimetry coupled with multi-linear regression analysis.

PubMed

Zang, Qing-Ce; Wang, Jia-Bo; Kong, Wei-Jun; Jin, Cheng; Ma, Zhi-Jie; Chen, Jing; Gong, Qian-Feng; Xiao, Xiao-He

2011-12-01

The fingerprints of artificial Calculus bovis extracts from different solvents were established by ultra-performance liquid chromatography (UPLC) and the anti-bacterial activities of artificial C. bovis extracts on Staphylococcus aureus (S. aureus) growth were studied by microcalorimetry. The UPLC fingerprints were evaluated using hierarchical clustering analysis. Some quantitative parameters obtained from the thermogenic curves of S. aureus growth affected by artificial C. bovis extracts were analyzed using principal component analysis. The spectrum-effect relationships between UPLC fingerprints and anti-bacterial activities were investigated using multi-linear regression analysis. The results showed that peak 1 (taurocholate sodium), peak 3 (unknown compound), peak 4 (cholic acid), and peak 6 (chenodeoxycholic acid) are more significant than the other peaks with the standard parameter estimate 0.453, -0.166, 0.749, 0.025, respectively. So, compounds cholic acid, taurocholate sodium, and chenodeoxycholic acid might be the major anti-bacterial components in artificial C. bovis. Altogether, this work provides a general model of the combination of UPLC chromatography and anti-bacterial effect to study the spectrum-effect relationships of artificial C. bovis extracts, which can be used to discover the main anti-bacterial components in artificial C. bovis or other Chinese herbal medicines with anti-bacterial effects. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Element enrichment factor calculation using grain-size distribution and functional data regression.

PubMed

Sierra, C; Ordóñez, C; Saavedra, A; Gallego, J R

2015-01-01

In environmental geochemistry studies it is common practice to normalize element concentrations in order to remove the effect of grain size. Linear regression with respect to a particular grain size or conservative element is a widely used method of normalization. In this paper, the utility of functional linear regression, in which the grain-size curve is the independent variable and the concentration of pollutant the dependent variable, is analyzed and applied to detrital sediment. After implementing functional linear regression and classical linear regression models to normalize and calculate enrichment factors, we concluded that the former regression technique has some advantages over the latter. First, functional linear regression directly considers the grain-size distribution of the samples as the explanatory variable. Second, as the regression coefficients are not constant values but functions depending on the grain size, it is easier to comprehend the relationship between grain size and pollutant concentration. Third, regularization can be introduced into the model in order to establish equilibrium between reliability of the data and smoothness of the solutions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Investigating the detection of multi-homed devices independent of operating systems

DTIC Science & Technology

2017-09-01

timestamp data was used to estimate clock skews using linear regression and linear optimization methods. Analysis revealed that detection depends on...the consistency of the estimated clock skew. Through vertical testing, it was also shown that clock skew consistency depends on the installed...optimization methods. Analysis revealed that detection depends on the consistency of the estimated clock skew. Through vertical testing, it was also
Data Assimilation and Propagation of Uncertainty in Multiscale Cardiovascular Simulation

NASA Astrophysics Data System (ADS)

Schiavazzi, Daniele; Marsden, Alison

2015-11-01

Cardiovascular modeling is the application of computational tools to predict hemodynamics. State-of-the-art techniques couple a 3D incompressible Navier-Stokes solver with a boundary circulation model and can predict local and peripheral hemodynamics, analyze the post-operative performance of surgical designs and complement clinical data collection minimizing invasive and risky measurement practices. The ability of these tools to make useful predictions is directly related to their accuracy in representing measured physiologies. Tuning of model parameters is therefore a topic of paramount importance and should include clinical data uncertainty, revealing how this uncertainty will affect the predictions. We propose a fully Bayesian, multi-level approach to data assimilation of uncertain clinical data in multiscale circulation models. To reduce the computational cost, we use a stable, condensed approximation of the 3D model build by linear sparse regression of the pressure/flow rate relationship at the outlets. Finally, we consider the problem of non-invasively propagating the uncertainty in model parameters to the resulting hemodynamics and compare Monte Carlo simulation with Stochastic Collocation approaches based on Polynomial or Multi-resolution Chaos expansions.
A Linear Regression and Markov Chain Model for the Arabian Horse Registry

DTIC Science & Technology

1993-04-01

as a tax deduction? Yes No T-4367 68 26. Regardless of previous equine tax deductions, do you consider your current horse activities to be... (Mark one...E L T-4367 A Linear Regression and Markov Chain Model For the Arabian Horse Registry Accesion For NTIS CRA&I UT 7 4:iC=D 5 D-IC JA" LI J:13tjlC,3 lO...the Arabian Horse Registry, which needed to forecast its future registration of purebred Arabian horses . A linear regression model was utilized to
ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches.

PubMed

Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K

2017-01-01

The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.
ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches

PubMed Central

Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.

2017-01-01

The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969
A primer for biomedical scientists on how to execute model II linear regression analysis.

PubMed

Ludbrook, John

2012-04-01

1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.
Linear regression in astronomy. II

NASA Technical Reports Server (NTRS)

Feigelson, Eric D.; Babu, Gutti J.

1992-01-01

A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Estimating extent of mortality associated with the Douglas-fir beetle in the Central and Northern Rockies

Treesearch

Jose F. Negron; Willis C. Schaupp; Kenneth E. Gibson; John Anhold; Dawn Hansen; Ralph Thier; Phil Mocettini

1999-01-01

Data collected from Douglas-fir stands infected by the Douglas-fir beetle in Wyoming, Montana, Idaho, and Utah, were used to develop models to estimate amount of mortality in terms of basal area killed. Models were built using stepwise linear regression and regression tree approaches. Linear regression models using initial Douglas-fir basal area were built for all...
Electricity Consumption in the Industrial Sector of Jordan: Application of Multivariate Linear Regression and Adaptive Neuro-Fuzzy Techniques

NASA Astrophysics Data System (ADS)

Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.

2009-08-01

In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.
Correcting for population structure and kinship using the linear mixed model: theory and extensions.

PubMed

Hoffman, Gabriel E

2013-01-01

Population structure and kinship are widespread confounding factors in genome-wide association studies (GWAS). It has been standard practice to include principal components of the genotypes in a regression model in order to account for population structure. More recently, the linear mixed model (LMM) has emerged as a powerful method for simultaneously accounting for population structure and kinship. The statistical theory underlying the differences in empirical performance between modeling principal components as fixed versus random effects has not been thoroughly examined. We undertake an analysis to formalize the relationship between these widely used methods and elucidate the statistical properties of each. Moreover, we introduce a new statistic, effective degrees of freedom, that serves as a metric of model complexity and a novel low rank linear mixed model (LRLMM) to learn the dimensionality of the correction for population structure and kinship, and we assess its performance through simulations. A comparison of the results of LRLMM and a standard LMM analysis applied to GWAS data from the Multi-Ethnic Study of Atherosclerosis (MESA) illustrates how our theoretical results translate into empirical properties of the mixed model. Finally, the analysis demonstrates the ability of the LRLMM to substantially boost the strength of an association for HDL cholesterol in Europeans.
A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

PubMed

Karabatsos, George

2017-02-01

Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

PubMed

Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

2016-04-01

Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
Fourier transform infrared reflectance spectra of latent fingerprints: a biometric gauge for the age of an individual.

PubMed

Hemmila, April; McGill, Jim; Ritter, David

2008-03-01

To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.
Applying the methodology of Design of Experiments to stability studies: a Partial Least Squares approach for evaluation of drug stability.

PubMed

Jordan, Nika; Zakrajšek, Jure; Bohanec, Simona; Roškar, Robert; Grabnar, Iztok

2018-05-01

The aim of the present research is to show that the methodology of Design of Experiments can be applied to stability data evaluation, as they can be seen as multi-factor and multi-level experimental designs. Linear regression analysis is usually an approach for analyzing stability data, but multivariate statistical methods could also be used to assess drug stability during the development phase. Data from a stability study for a pharmaceutical product with hydrochlorothiazide (HCTZ) as an unstable drug substance was used as a case example in this paper. The design space of the stability study was modeled using Umetrics MODDE 10.1 software. We showed that a Partial Least Squares model could be used for a multi-dimensional presentation of all data generated in a stability study and for determination of the relationship among factors that influence drug stability. It might also be used for stability predictions and potentially for the optimization of the extent of stability testing needed to determine shelf life and storage conditions, which would be time and cost-effective for the pharmaceutical industry.
Effectiveness of a worksite mindfulness-based multi-component intervention on lifestyle behaviors

PubMed Central

2014-01-01

Introduction Overweight and obesity are associated with an increased risk of morbidity. Mindfulness training could be an effective strategy to optimize lifestyle behaviors related to body weight gain. The aim of this study was to evaluate the effectiveness of a worksite mindfulness-based multi-component intervention on vigorous physical activity in leisure time, sedentary behavior at work, fruit intake and determinants of these behaviors. The control group received information on existing lifestyle behavior- related facilities that were already available at the worksite. Methods In a randomized controlled trial design (n = 257), 129 workers received a mindfulness training, followed by e-coaching, lunch walking routes and fruit. Outcome measures were assessed at baseline and after 6 and 12 months using questionnaires. Physical activity was also measured using accelerometers. Effects were analyzed using linear mixed effect models according to the intention-to-treat principle. Linear regression models (complete case analyses) were used as sensitivity analyses. Results There were no significant differences in lifestyle behaviors and determinants of these behaviors between the intervention and control group after 6 or 12 months. The sensitivity analyses showed effect modification for gender in sedentary behavior at work at 6-month follow-up, although the main analyses did not. Conclusions This study did not show an effect of a worksite mindfulness-based multi-component intervention on lifestyle behaviors and behavioral determinants after 6 and 12 months. The effectiveness of a worksite mindfulness-based multi-component intervention as a health promotion intervention for all workers could not be established. PMID:24467802
Orthogonal Regression: A Teaching Perspective

ERIC Educational Resources Information Center

Carr, James R.

2012-01-01

A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Integration of least angle regression with empirical Bayes for multi-locus genome-wide association studies

USDA-ARS?s Scientific Manuscript database

Multi-locus genome-wide association studies has become the state-of-the-art procedure to identify quantitative trait loci (QTL) associated with traits simultaneously. However, implementation of multi-locus model is still difficult. In this study, we integrated least angle regression with empirical B...
Using Linear Equating to Map PROMIS(®) Global Health Items and the PROMIS-29 V2.0 Profile Measure to the Health Utilities Index Mark 3.

PubMed

Hays, Ron D; Revicki, Dennis A; Feeny, David; Fayers, Peter; Spritzer, Karen L; Cella, David

2016-10-01

Preference-based health-related quality of life (HR-QOL) scores are useful as outcome measures in clinical studies, for monitoring the health of populations, and for estimating quality-adjusted life-years. This was a secondary analysis of data collected in an internet survey as part of the Patient-Reported Outcomes Measurement Information System (PROMIS(®)) project. To estimate Health Utilities Index Mark 3 (HUI-3) preference scores, we used the ten PROMIS(®) global health items, the PROMIS-29 V2.0 single pain intensity item and seven multi-item scales (physical functioning, fatigue, pain interference, depressive symptoms, anxiety, ability to participate in social roles and activities, sleep disturbance), and the PROMIS-29 V2.0 items. Linear regression analyses were used to identify significant predictors, followed by simple linear equating to avoid regression to the mean. The regression models explained 48 % (global health items), 61 % (PROMIS-29 V2.0 scales), and 64 % (PROMIS-29 V2.0 items) of the variance in the HUI-3 preference score. Linear equated scores were similar to observed scores, although differences tended to be larger for older study participants. HUI-3 preference scores can be estimated from the PROMIS(®) global health items or PROMIS-29 V2.0. The estimated HUI-3 scores from the PROMIS(®) health measures can be used for economic applications and as a measure of overall HR-QOL in research.
Analyzing Multilevel Data: Comparing Findings from Hierarchical Linear Modeling and Ordinary Least Squares Regression

ERIC Educational Resources Information Center

Rocconi, Louis M.

2013-01-01

This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…

Pan evaporation modeling using six different heuristic computing methods in different climates of China

NASA Astrophysics Data System (ADS)

Wang, Lunche; Kisi, Ozgur; Zounemat-Kermani, Mohammad; Li, Hui

2017-01-01

Pan evaporation (Ep) plays important roles in agricultural water resources management. One of the basic challenges is modeling Ep using limited climatic parameters because there are a number of factors affecting the evaporation rate. This study investigated the abilities of six different soft computing methods, multi-layer perceptron (MLP), generalized regression neural network (GRNN), fuzzy genetic (FG), least square support vector machine (LSSVM), multivariate adaptive regression spline (MARS), adaptive neuro-fuzzy inference systems with grid partition (ANFIS-GP), and two regression methods, multiple linear regression (MLR) and Stephens and Stewart model (SS) in predicting monthly Ep. Long-term climatic data at various sites crossing a wide range of climates during 1961-2000 are used for model development and validation. The results showed that the models have different accuracies in different climates and the MLP model performed superior to the other models in predicting monthly Ep at most stations using local input combinations (for example, the MAE (mean absolute errors), RMSE (root mean square errors), and determination coefficient (R2) are 0.314 mm/day, 0.405 mm/day and 0.988, respectively for HEB station), while GRNN model performed better in Tibetan Plateau (MAE, RMSE and R2 are 0.459 mm/day, 0.592 mm/day and 0.932, respectively). The accuracies of above models ranked as: MLP, GRNN, LSSVM, FG, ANFIS-GP, MARS and MLR. The overall results indicated that the soft computing techniques generally performed better than the regression methods, but MLR and SS models can be more preferred at some climatic zones instead of complex nonlinear models, for example, the BJ (Beijing), CQ (Chongqing) and HK (Haikou) stations. Therefore, it can be concluded that Ep could be successfully predicted using above models in hydrological modeling studies.
Artificial Neural Networks: A Novel Approach to Analysing the Nutritional Ecology of a Blowfly Species, Chrysomya megacephala

PubMed Central

Bianconi, André; Zuben, Cláudio J. Von; Serapião, Adriane B. de S.; Govone, José S.

2010-01-01

Bionomic features of blowflies may be clarified and detailed by the deployment of appropriate modelling techniques such as artificial neural networks, which are mathematical tools widely applied to the resolution of complex biological problems. The principal aim of this work was to use three well-known neural networks, namely Multi-Layer Perceptron (MLP), Radial Basis Function (RBF), and Adaptive Neural Network-Based Fuzzy Inference System (ANFIS), to ascertain whether these tools would be able to outperform a classical statistical method (multiple linear regression) in the prediction of the number of resultant adults (survivors) of experimental populations of Chrysomya megacephala (F.) (Diptera: Calliphoridae), based on initial larval density (number of larvae), amount of available food, and duration of immature stages. The coefficient of determination (R2) derived from the RBF was the lowest in the testing subset in relation to the other neural networks, even though its R2 in the training subset exhibited virtually a maximum value. The ANFIS model permitted the achievement of the best testing performance. Hence this model was deemed to be more effective in relation to MLP and RBF for predicting the number of survivors. All three networks outperformed the multiple linear regression, indicating that neural models could be taken as feasible techniques for predicting bionomic variables concerning the nutritional dynamics of blowflies. PMID:20569135
Modeling the frequency of opposing left-turn conflicts at signalized intersections using generalized linear regression models.

PubMed

Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei

2014-01-01

The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.
Walkability and cardiometabolic risk factors: Cross-sectional and longitudinal associations from the Multi-Ethnic Study of Atherosclerosis.

PubMed

Braun, Lindsay M; Rodríguez, Daniel A; Evenson, Kelly R; Hirsch, Jana A; Moore, Kari A; Diez Roux, Ana V

2016-05-01

We used data from 3227 older adults in the Multi-Ethnic Study of Atherosclerosis (2004-2012) to explore cross-sectional and longitudinal associations between walkability and cardiometabolic risk factors. In cross-sectional analyses, linear regression was used to estimate associations of Street Smart Walk Score® with glucose, triglycerides, HDL and LDL cholesterol, systolic and diastolic blood pressure, and waist circumference, while logistic regression was used to estimate associations with odds of metabolic syndrome. Econometric fixed effects models were used to estimate longitudinal associations of changes in walkability with changes in each risk factor among participants who moved residential locations between 2004 and 2012 (n=583). Most cross-sectional and longitudinal associations were small and statistically non-significant. We found limited evidence that higher walkability was cross-sectionally associated with lower blood pressure but that increases in walkability were associated with increases in triglycerides and blood pressure over time. Further research over longer time periods is needed to understand the potential for built environment interventions to improve cardiometabolic health. Copyright © 2016 Elsevier Ltd. All rights reserved.
Wavelet regression model in forecasting crude oil price

NASA Astrophysics Data System (ADS)

Hamid, Mohd Helmie; Shabri, Ani

2017-05-01

This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

EPA Science Inventory

Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
USING LINEAR AND POLYNOMIAL MODELS TO EXAMINE THE ENVIRONMENTAL STABILITY OF VIRUSES

EPA Science Inventory

The article presents the development of model equations for describing the fate of viral infectivity in environmental samples. Most of the models were based upon the use of a two-step linear regression approach. The first step employs regression of log base 10 transformed viral t...
Multi-Target Regression via Robust Low-Rank Learning.

PubMed

Zhen, Xiantong; Yu, Mengyang; He, Xiaofei; Li, Shuo

2018-02-01

Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.
Linear regression metamodeling as a tool to summarize and present simulation model results.

PubMed

Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M

2013-10-01

Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
Scoring and staging systems using cox linear regression modeling and recursive partitioning.

PubMed

Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H

2006-01-01

Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
[Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

PubMed

Ling, Ru; Liu, Jiawang

2011-12-01

To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Effect of linear and non-linear blade modelling techniques on simulated fatigue and extreme loads using Bladed

NASA Astrophysics Data System (ADS)

Beardsell, Alec; Collier, William; Han, Tao

2016-09-01

There is a trend in the wind industry towards ever larger and more flexible turbine blades. Blade tip deflections in modern blades now commonly exceed 10% of blade length. Historically, the dynamic response of wind turbine blades has been analysed using linear models of blade deflection which include the assumption of small deflections. For modern flexible blades, this assumption is becoming less valid. In order to continue to simulate dynamic turbine performance accurately, routine use of non-linear models of blade deflection may be required. This can be achieved by representing the blade as a connected series of individual flexible linear bodies - referred to in this paper as the multi-part approach. In this paper, Bladed is used to compare load predictions using single-part and multi-part blade models for several turbines. The study examines the impact on fatigue and extreme loads and blade deflection through reduced sets of load calculations based on IEC 61400-1 ed. 3. Damage equivalent load changes of up to 16% and extreme load changes of up to 29% are observed at some turbine load locations. It is found that there is no general pattern in the loading differences observed between single-part and multi-part blade models. Rather, changes in fatigue and extreme loads with a multi-part blade model depend on the characteristics of the individual turbine and blade. Key underlying causes of damage equivalent load change are identified as differences in edgewise- torsional coupling between the multi-part and single-part models, and increased edgewise rotor mode damping in the multi-part model. Similarly, a causal link is identified between torsional blade dynamics and changes in ultimate load results.
Self-consistent asset pricing models

NASA Astrophysics Data System (ADS)

Malevergne, Y.; Sornette, D.

2007-08-01

We discuss the foundations of factor or regression models in the light of the self-consistency condition that the market portfolio (and more generally the risk factors) is (are) constituted of the assets whose returns it is (they are) supposed to explain. As already reported in several articles, self-consistency implies correlations between the return disturbances. As a consequence, the alphas and betas of the factor model are unobservable. Self-consistency leads to renormalized betas with zero effective alphas, which are observable with standard OLS regressions. When the conditions derived from internal consistency are not met, the model is necessarily incomplete, which means that some sources of risk cannot be replicated (or hedged) by a portfolio of stocks traded on the market, even for infinite economies. Analytical derivations and numerical simulations show that, for arbitrary choices of the proxy which are different from the true market portfolio, a modified linear regression holds with a non-zero value αi at the origin between an asset i's return and the proxy's return. Self-consistency also introduces “orthogonality” and “normality” conditions linking the betas, alphas (as well as the residuals) and the weights of the proxy portfolio. Two diagnostics based on these orthogonality and normality conditions are implemented on a basket of 323 assets which have been components of the S&P500 in the period from January 1990 to February 2005. These two diagnostics show interesting departures from dynamical self-consistency starting about 2 years before the end of the Internet bubble. Assuming that the CAPM holds with the self-consistency condition, the OLS method automatically obeys the resulting orthogonality and normality conditions and therefore provides a simple way to self-consistently assess the parameters of the model by using proxy portfolios made only of the assets which are used in the CAPM regressions. Finally, the factor decomposition with the self-consistency condition derives a risk-factor decomposition in the multi-factor case which is identical to the principal component analysis (PCA), thus providing a direct link between model-driven and data-driven constructions of risk factors. This correspondence shows that PCA will therefore suffer from the same limitations as the CAPM and its multi-factor generalization, namely lack of out-of-sample explanatory power and predictability. In the multi-period context, the self-consistency conditions force the betas to be time-dependent with specific constraints.
Pseudo second order kinetics and pseudo isotherms for malachite green onto activated carbon: comparison of linear and non-linear regression methods.

PubMed

Kumar, K Vasanth; Sivanesan, S

2006-08-25

Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Qichun; Zhang, Xuesong; Xu, Xingya

Riverine carbon cycling is an important, but insufficiently investigated component of the global carbon cycle. Analyses of environmental controls on riverine carbon cycling are critical for improved understanding of mechanisms regulating carbon processing and storage along the terrestrial-aquatic continuum. Here, we compile and analyze riverine dissolved organic carbon (DOC) concentration data from 1402 United States Geological Survey (USGS) gauge stations to examine the spatial variability and environmental controls of DOC concentrations in the United States (U.S.) surface waters. DOC concentrations exhibit high spatial variability, with an average of 6.42 ± 6.47 mg C/ L (Mean ± Standard Deviation). In general,more » high DOC concentrations occur in the Upper Mississippi River basin and the Southeastern U.S., while low concentrations are mainly distributed in the Western U.S. Single-factor analysis indicates that slope of drainage areas, wetlands, forests, percentage of first-order streams, and instream nutrients (such as nitrogen and phosphorus) pronouncedly influence DOC concentrations, but the explanatory power of each bivariate model is lower than 35%. Analyses based on the general multi-linear regression models suggest DOC concentrations are jointly impacted by multiple factors. Soil properties mainly show positive correlations with DOC concentrations; forest and shrub lands have positive correlations with DOC concentrations, but urban area and croplands demonstrate negative impacts; total instream phosphorus and dam density correlate positively with DOC concentrations. Notably, the relative importance of these environmental controls varies substantially across major U.S. water resource regions. In addition, DOC concentrations and environmental controls also show significant variability from small streams to large rivers, which may be caused by changing carbon sources and removal rates by river orders. In sum, our results reveal that general multi-linear regression analysis of twenty one terrestrial and aquatic environmental factors can partially explain (56%) the DOC concentration variation. In conclusion, this study highlights the complexity of the interactions among these environmental factors in determining DOC concentrations, thus calls for processes-based, non-linear methodologies to constrain uncertainties in riverine DOC cycling.« less
Spatially resolved regression analysis of pre-treatment FDG, FLT and Cu-ATSM PET from post-treatment FDG PET: an exploratory study

PubMed Central

Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert

2012-01-01

Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748
A road map for multi-way calibration models.

PubMed

Escandar, Graciela M; Olivieri, Alejandro C

2017-08-07

A large number of experimental applications of multi-way calibration are known, and a variety of chemometric models are available for the processing of multi-way data. While the main focus has been directed towards three-way data, due to the availability of various instrumental matrix measurements, a growing number of reports are being produced on order signals of increasing complexity. The purpose of this review is to present a general scheme for selecting the appropriate data processing model, according to the properties exhibited by the multi-way data. In spite of the complexity of the multi-way instrumental measurements, simple criteria can be proposed for model selection, based on the presence and number of the so-called multi-linearity breaking modes (instrumental modes that break the low-rank multi-linearity of the multi-way arrays), and also on the existence of mutually dependent instrumental modes. Recent literature reports on multi-way calibration are reviewed, with emphasis on the models that were selected for data processing.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

PubMed

Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

2016-03-01

In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Classical Testing in Functional Linear Models.

PubMed

Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab

2016-01-01

We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.
Classical Testing in Functional Linear Models

PubMed Central

Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab

2016-01-01

We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155

Dosing algorithm for warfarin using CYP2C9 and VKORC1 genotyping from a multi-ethnic population: comparison with other equations.

PubMed

Wu, Alan H B; Wang, Ping; Smith, Andrew; Haller, Christine; Drake, Katherine; Linder, Mark; Valdes, Roland

2008-02-01

Polymorphism in the genes for cytochrome (CYP)2C9 and the vitamin K epoxide reductase complex subunit 1 (VKORC1) affect the pharmacokinetics and pharmacodynamics of warfarin. We developed and validated a warfarin-dosing algorithm for a multi-ethnic population that predicts the best dose for stable anticoagulation, and compared its performance against other regression equations. We determined the allele and haplotype frequencies of genes for CYP2C9 and VKORC1 on 167 Caucasian, African-American, Asian and Hispanic patients on warfarin. On a subset where complete data were available (n=92), we developed a dosing equation that predicts the actual dose needed to maintain target anticoagulation using demographic variables and genotypes. This regression was validated against an independent group of subjects. We also applied our data to five other published warfarin-dosing equations. The allele frequency for CYP2C9*2 and *3 and the A allele for VKORC1 3673 was similar to previously published reports. For Caucasians and Asians, VKORC1 SNPs were in Hardy-Weinberg linkage equilibrium. Some VKORC1 SNPs among the African-American population and one SNP among Hispanics were not in equilibrium. The linear regression of predicted versus actual warfarin dose produced r-values of 0.71 for the training set and 0.67 for the validation set. The regression coefficient improved (to r=0.78 and 0.75, respectively) when rare genotypes were eliminated or when the 7566 VKORC1 genotype was added to the model. All of the regression models tested produced a similar degree of correlation. The exclusion of rare genotypes that are more associated with certain ethnicities improved the model. Minor improvements in algorithms can be observed with the inclusion of ethnicity and more CYP2C9 and VKORC1 SNPs as variables. Major improvements will likely require the identification of new gene associations with warfarin dosing.
Detailed emission profiles for on-road vehicles derived from ambient measurements during a windless traffic episode in Baltimore using a multi-model approach

NASA Astrophysics Data System (ADS)

Ke, Haohao; Ondov, John M.; Rogge, Wolfgang F.

2013-12-01

Composite chemical profiles of motor vehicle emissions were extracted from ambient measurements at a near-road site in Baltimore during a windless traffic episode in November, 2002, using four independent approaches, i.e., simple peak analysis, windless model-based linear regression, PMF, and UNMIX. Although the profiles are in general agreement, the windless-model-based profile treatment more effectively removes interference from non-traffic sources and is deemed to be more accurate for many species. In addition to abundances of routine pollutants (e.g., NOx, CO, PM2.5, EC, OC, sulfate, and nitrate), 11 particle-bound metals and 51 individual traffic-related organic compounds (including n-alkanes, PAHs, oxy-PAHs, hopanes, alkylcyclohexanes, and others) were included in the modeling.
Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

ERIC Educational Resources Information Center

Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

2013-01-01

Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.

PubMed

Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S

2017-06-01

The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
Horses Auto-Recruit Their Lungs by Inspiratory Breath Holding Following Recovery from General Anaesthesia

PubMed Central

Mosing, Martina; Waldmann, Andreas D.; MacFarlane, Paul; Iff, Samuel; Auer, Ulrike; Bohm, Stephan H.; Bettschart-Wolfensberger, Regula; Bardell, David

2016-01-01

This study evaluated the breathing pattern and distribution of ventilation in horses prior to and following recovery from general anaesthesia using electrical impedance tomography (EIT). Six horses were anaesthetised for 6 hours in dorsal recumbency. Arterial blood gas and EIT measurements were performed 24 hours before (baseline) and 1, 2, 3, 4, 5 and 6 hours after horses stood following anaesthesia. At each time point 4 representative spontaneous breaths were analysed. The percentage of the total breath length during which impedance remained greater than 50% of the maximum inspiratory impedance change (breath holding), the fraction of total tidal ventilation within each of four stacked regions of interest (ROI) (distribution of ventilation) and the filling time and inflation period of seven ROI evenly distributed over the dorso-ventral height of the lungs were calculated. Mixed effects multi-linear regression and linear regression were used and significance was set at p<0.05. All horses demonstrated inspiratory breath holding until 5 hours after standing. No change from baseline was seen for the distribution of ventilation during inspiration. Filling time and inflation period were more rapid and shorter in ventral and slower and longer in most dorsal ROI compared to baseline, respectively. In a mixed effects multi-linear regression, breath holding was significantly correlated with PaCO2 in both the univariate and multivariate regression. Following recovery from anaesthesia, horses showed inspiratory breath holding during which gas redistributed from ventral into dorsal regions of the lungs. This suggests auto-recruitment of lung tissue which would have been dependent and likely atelectic during anaesthesia. PMID:27331910
Parameterized data-driven fuzzy model based optimal control of a semi-batch reactor.

PubMed

Kamesh, Reddi; Rani, K Yamuna

2016-09-01

A parameterized data-driven fuzzy (PDDF) model structure is proposed for semi-batch processes, and its application for optimal control is illustrated. The orthonormally parameterized input trajectories, initial states and process parameters are the inputs to the model, which predicts the output trajectories in terms of Fourier coefficients. Fuzzy rules are formulated based on the signs of a linear data-driven model, while the defuzzification step incorporates a linear regression model to shift the domain from input to output domain. The fuzzy model is employed to formulate an optimal control problem for single rate as well as multi-rate systems. Simulation study on a multivariable semi-batch reactor system reveals that the proposed PDDF modeling approach is capable of capturing the nonlinear and time-varying behavior inherent in the semi-batch system fairly accurately, and the results of operating trajectory optimization using the proposed model are found to be comparable to the results obtained using the exact first principles model, and are also found to be comparable to or better than parameterized data-driven artificial neural network model based optimization results. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Resting Heart Rate as Predictor for Left Ventricular Dysfunction and Heart Failure: The Multi-Ethnic Study of Atherosclerosis

PubMed Central

Opdahl, Anders; Venkatesh, Bharath Ambale; Fernandes, Veronica R. S.; Wu, Colin O.; Nasir, Khurram; Choi, Eui-Young; Almeida, Andre L. C.; Rosen, Boaz; Carvalho, Benilton; Edvardsen, Thor; Bluemke, David A.; Lima, Joao A. C.

2014-01-01

OBJECTIVE To investigate the relationship between baseline resting heart rate and incidence of heart failure (HF) and global and regional left ventricular (LV) dysfunction. BACKGROUND The association of resting heart rate to HF and LV function is not well described in an asymptomatic multi-ethnic population. METHODS Participants in the Multi-Ethnic Study of Atherosclerosis had resting heart rate measured at inclusion. Incident HF was registered (n=176) during follow-up (median 7 years) in those who underwent cardiac MRI (n=5000). Changes in ejection fraction (ΔEF) and peak circumferential strain (Δεcc) were measured as markers of developing global and regional LV dysfunction in 1056 participants imaged at baseline and 5 years later. Time to HF (Cox model) and Δεcc and ΔEF (multiple linear regression models) were adjusted for demographics, traditional cardiovascular risk factors, calcium score, LV end-diastolic volume and mass in addition to resting heart rate. RESULTS Cox analysis demonstrated that for 1 bpm increase in resting heart rate there was a 4% greater adjusted relative risk for incident HF (Hazard Ratio: 1.04 (1.02, 1.06 (95% CI); P<0.001). Adjusted multiple regression models demonstrated that resting heart rate was positively associated with deteriorating εcc and decrease in EF, even in analyses when all coronary heart disease events were excluded from the model. CONCLUSION Elevated resting heart rate is associated with increased risk for incident HF in asymptomatic participants in MESA. Higher heart rate is related to development of regional and global LV dysfunction independent of subclinical atherosclerosis and coronary heart disease. PMID:24412444
An automated ranking platform for machine learning regression models for meat spoilage prediction using multi-spectral imaging and metabolic profiling.

PubMed

Estelles-Lopez, Lucia; Ropodi, Athina; Pavlidis, Dimitris; Fotopoulou, Jenny; Gkousari, Christina; Peyrodie, Audrey; Panagou, Efstathios; Nychas, George-John; Mohareb, Fady

2017-09-01

Over the past decade, analytical approaches based on vibrational spectroscopy, hyperspectral/multispectral imagining and biomimetic sensors started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques. Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine-learning algorithms before the results can be interpreted. Choosing the optimum pattern recognition or machine learning approach for a given analytical platform is often challenging and involves a comparative analysis between various algorithms in order to achieve the best possible prediction accuracy. In this work, "MeatReg", a web-based application is presented, able to automate the procedure of identifying the best machine learning method for comparing data from several analytical techniques, to predict the counts of microorganisms responsible of meat spoilage regardless of the packaging system applied. In particularly up to 7 regression methods were applied and these are ordinary least squares regression, stepwise linear regression, partial least square regression, principal component regression, support vector regression, random forest and k-nearest neighbours. MeatReg" was tested with minced beef samples stored under aerobic and modified atmosphere packaging and analysed with electronic nose, HPLC, FT-IR, GC-MS and Multispectral imaging instrument. Population of total viable count, lactic acid bacteria, pseudomonads, Enterobacteriaceae and B. thermosphacta, were predicted. As a result, recommendations of which analytical platforms are suitable to predict each type of bacteria and which machine learning methods to use in each case were obtained. The developed system is accessible via the link: www.sorfml.com. Copyright © 2017 Elsevier Ltd. All rights reserved.
Using Parametric Cost Models to Estimate Engineering and Installation Costs of Selected Electronic Communications Systems

DTIC Science & Technology

1994-09-01

Institute of Technology, Wright- Patterson AFB OH, January 1994. 4. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 5...Technology, Wright-Patterson AFB OH 5 April 1994. 29. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 30. Office of
Genetic Programming Transforms in Linear Regression Situations

NASA Astrophysics Data System (ADS)

Castillo, Flor; Kordon, Arthur; Villa, Carlos

The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Multivariate Regression Analysis of Winter Ozone Events in the Uinta Basin of Eastern Utah, USA

NASA Astrophysics Data System (ADS)

Mansfield, M. L.

2012-12-01

I report on a regression analysis of a number of variables that are involved in the formation of winter ozone in the Uinta Basin of Eastern Utah. One goal of the analysis is to develop a mathematical model capable of predicting the daily maximum ozone concentration from values of a number of independent variables. The dependent variable is the daily maximum ozone concentration at a particular site in the basin. Independent variables are (1) daily lapse rate, (2) daily "basin temperature" (defined below), (3) snow cover, (4) midday solar zenith angle, (5) monthly oil production, (6) monthly gas production, and (7) the number of days since the beginning of a multi-day inversion event. Daily maximum temperature and daily snow cover data are available at ten or fifteen different sites throughout the basin. The daily lapse rate is defined operationally as the slope of the linear least-squares fit to the temperature-altitude plot, and the "basin temperature" is defined as the value assumed by the same least-squares line at an altitude of 1400 m. A multi-day inversion event is defined as a set of consecutive days for which the lapse rate remains positive. The standard deviation in the accuracy of the model is about 10 ppb. The model has been combined with historical climate and oil & gas production data to estimate historical ozone levels.
SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

EPA Science Inventory

As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Prediction of siRNA potency using sparse logistic regression.

PubMed

Hu, Wei; Hu, John

2014-06-01

RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Predictive and mechanistic multivariate linear regression models for reaction development

PubMed Central

Santiago, Celine B.; Guo, Jing-Yao

2018-01-01

Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711
Adding a Parameter Increases the Variance of an Estimated Regression Function

ERIC Educational Resources Information Center

Withers, Christopher S.; Nadarajah, Saralees

2011-01-01

The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…
Comparison of Linear and Non-linear Regression Analysis to Determine Pulmonary Pressure in Hyperthyroidism.

PubMed

Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan

2017-01-01

This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
Random regression analyses using B-spline functions to model growth of Nellore cattle.

PubMed

Boligon, A A; Mercadante, M E Z; Lôbo, R B; Baldi, F; Albuquerque, L G

2012-02-01

The objective of this study was to estimate (co)variance components using random regression on B-spline functions to weight records obtained from birth to adulthood. A total of 82 064 weight records of 8145 females obtained from the data bank of the Nellore Breeding Program (PMGRN/Nellore Brazil) which started in 1987, were used. The models included direct additive and maternal genetic effects and animal and maternal permanent environmental effects as random. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of age (cubic regression) were considered as random covariate. The random effects were modeled using B-spline functions considering linear, quadratic and cubic polynomials for each individual segment. Residual variances were grouped in five age classes. Direct additive genetic and animal permanent environmental effects were modeled using up to seven knots (six segments). A single segment with two knots at the end points of the curve was used for the estimation of maternal genetic and maternal permanent environmental effects. A total of 15 models were studied, with the number of parameters ranging from 17 to 81. The models that used B-splines were compared with multi-trait analyses with nine weight traits and to a random regression model that used orthogonal Legendre polynomials. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most appropriate and parsimonious model to describe the covariance structure of the data. Selection for higher weight, such as at young ages, should be performed taking into account an increase in mature cow weight. Particularly, this is important in most of Nellore beef cattle production systems, where the cow herd is maintained on range conditions. There is limited modification of the growth curve of Nellore cattle with respect to the aim of selecting them for rapid growth at young ages while maintaining constant adult weight.
Assessing the Causality between Blood Pressure and Retinal Vascular Caliber through Mendelian Randomisation

NASA Astrophysics Data System (ADS)

Li, Ling-Jun; Liao, Jiemin; Cheung, Carol Yim-Lui; Ikram, M. Kamran; Shyong, Tai E.; Wong, Tien-Yin; Cheng, Ching-Yu

2016-02-01

We aimed to determine the association between blood pressure (BP) and retinal vascular caliber changes that were free from confounders and reverse causation by using Mendelian randomisation. A total of 6528 participants from a multi-ethnic cohort (Chinese, Malays, and Indians) in Singapore were included in this study. Retinal arteriolar and venular caliber was measured by a semi-automated computer program. Genotyping was done using Illumina 610-quad chips. Meta-analysis of association between BP, and retinal arteriolar and venular caliber across three ethnic groups was performed both in conventional linear regression and Mendelian randomisation framework with a genetic risk score of BP as an instrumental variable. In multiple linear regression models, each 10 mm Hg increase in systolic BP, diastolic BP, and mean arterial BP (MAP) was associated with significant decreases in retinal arteriolar caliber of a 1.4, 3.0, and 2.6 μm, and significant decreases in retinal venular caliber of a 0.6, 0.7, and 0.9 μm, respectively. In a Mendelian randomisation model, only associations between DBP and MAP and retinal arteriolar narrowing remained yet its significance was greatly reduced. Our data showed weak evidence of a causal relationship between elevated BP and retinal arteriolar narrowing.
Large signal-to-noise ratio quantification in MLE for ARARMAX models

NASA Astrophysics Data System (ADS)

Zou, Yiqun; Tang, Xiafei

2014-06-01

It has been shown that closed-loop linear system identification by indirect method can be generally transferred to open-loop ARARMAX (AutoRegressive AutoRegressive Moving Average with eXogenous input) estimation. For such models, the gradient-related optimisation with large enough signal-to-noise ratio (SNR) can avoid the potential local convergence in maximum likelihood estimation. To ease the application of this condition, the threshold SNR needs to be quantified. In this paper, we build the amplitude coefficient which is an equivalence to the SNR and prove the finiteness of the threshold amplitude coefficient within the stability region. The quantification of threshold is achieved by the minimisation of an elaborately designed multi-variable cost function which unifies all the restrictions on the amplitude coefficient. The corresponding algorithm based on two sets of physically realisable system input-output data details the minimisation and also points out how to use the gradient-related method to estimate ARARMAX parameters when local minimum is present as the SNR is small. Then, the algorithm is tested on a theoretical AutoRegressive Moving Average with eXogenous input model for the derivation of the threshold and a gas turbine engine real system for model identification, respectively. Finally, the graphical validation of threshold on a two-dimensional plot is discussed.
Sensitivity Analysis of Mechanical Parameters of Different Rock Layers to the Stability of Coal Roadway in Soft Rock Strata

PubMed Central

Zhao, Zeng-hui; Wang, Wei-ming; Gao, Xin; Yan, Ji-xing

2013-01-01

According to the geological characteristics of Xinjiang Ili mine in western area of China, a physical model of interstratified strata composed of soft rock and hard coal seam was established. Selecting the tunnel position, deformation modulus, and strength parameters of each layer as influencing factors, the sensitivity coefficient of roadway deformation to each parameter was firstly analyzed based on a Mohr-Columb strain softening model and nonlinear elastic-plastic finite element analysis. Then the effect laws of influencing factors which showed high sensitivity were further discussed. Finally, a regression model for the relationship between roadway displacements and multifactors was obtained by equivalent linear regression under multiple factors. The results show that the roadway deformation is highly sensitive to the depth of coal seam under the floor which should be considered in the layout of coal roadway; deformation modulus and strength of coal seam and floor have a great influence on the global stability of tunnel; on the contrary, roadway deformation is not sensitive to the mechanical parameters of soft roof; roadway deformation under random combinations of multi-factors can be deduced by the regression model. These conclusions provide theoretical significance to the arrangement and stability maintenance of coal roadway. PMID:24459447

An Application to the Prediction of LOD Change Based on General Regression Neural Network

NASA Astrophysics Data System (ADS)

Zhang, X. H.; Wang, Q. J.; Zhu, J. J.; Zhang, H.

2011-07-01

Traditional prediction of the LOD (length of day) change was based on linear models, such as the least square model and the autoregressive technique, etc. Due to the complex non-linear features of the LOD variation, the performances of the linear model predictors are not fully satisfactory. This paper applies a non-linear neural network - general regression neural network (GRNN) model to forecast the LOD change, and the results are analyzed and compared with those obtained with the back propagation neural network and other models. The comparison shows that the performance of the GRNN model in the prediction of the LOD change is efficient and feasible.
Comparison of Selection Procedures and Validation of Criterion Used in Selection of Significant Control Variates of a Simulation Model

DTIC Science & Technology

1990-03-01

and M.H. Knuter. Applied Linear Regression Models. Homewood IL: Richard D. Erwin Inc., 1983. Pritsker, A. Alan B. Introduction to Simulation and SLAM...Control Variates in Simulation," European Journal of Operational Research, 42: (1989). Neter, J., W. Wasserman, and M.H. Xnuter. Applied Linear Regression Models
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

ERIC Educational Resources Information Center

Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

2015-01-01

Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis

ERIC Educational Resources Information Center

Camilleri, Liberato; Cefai, Carmel

2013-01-01

Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
A secure distributed logistic regression protocol for the detection of rare adverse drug events

PubMed Central

El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat

2013-01-01

Background There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. Objective To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. Methods We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. Results The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. Conclusion The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models. PMID:22871397
A secure distributed logistic regression protocol for the detection of rare adverse drug events.

PubMed

El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat

2013-05-01

There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models.
Intimate partner violence and anxiety disorders in pregnancy: the importance of vocational training of the nursing staff in facing them1

PubMed Central

Fonseca-Machado, Mariana de Oliveira; Monteiro, Juliana Cristina dos Santos; Haas, Vanderlei José; Abrão, Ana Cristina Freitas de Vilhena; Gomes-Sponholz, Flávia

2015-01-01

Objective: to identify the relationship between posttraumatic stress disorder, trait and state anxiety, and intimate partner violence during pregnancy. Method: observational, cross-sectional study developed with 358 pregnant women. The Posttraumatic Stress Disorder Checklist - Civilian Version was used, as well as the State-Trait Anxiety Inventory and an adapted version of the instrument used in the World Health Organization Multi-country Study on Women's Health and Domestic Violence. Results: after adjusting to the multiple logistic regression model, intimate partner violence, occurred during pregnancy, was associated with the indication of posttraumatic stress disorder. The adjusted multiple linear regression models showed that the victims of violence, in the current pregnancy, had higher symptom scores of trait and state anxiety than non-victims. Conclusion: recognizing the intimate partner violence as a clinically relevant and identifiable risk factor for the occurrence of anxiety disorders during pregnancy can be a first step in the prevention thereof. PMID:26487135
Random regression models on Legendre polynomials to estimate genetic parameters for weights from birth to adult age in Canchim cattle.

PubMed

Baldi, F; Albuquerque, L G; Alencar, M M

2010-08-01

The objective of this work was to estimate covariance functions for direct and maternal genetic effects, animal and maternal permanent environmental effects, and subsequently, to derive relevant genetic parameters for growth traits in Canchim cattle. Data comprised 49,011 weight records on 2435 females from birth to adult age. The model of analysis included fixed effects of contemporary groups (year and month of birth and at weighing) and age of dam as quadratic covariable. Mean trends were taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were allowed to vary and were modelled by a step function with 1, 4 or 11 classes based on animal's age. The model fitting four classes of residual variances was the best. A total of 12 random regression models from second to seventh order were used to model direct and maternal genetic effects, animal and maternal permanent environmental effects. The model with direct and maternal genetic effects, animal and maternal permanent environmental effects fitted by quadric, cubic, quintic and linear Legendre polynomials, respectively, was the most adequate to describe the covariance structure of the data. Estimates of direct and maternal heritability obtained by multi-trait (seven traits) and random regression models were very similar. Selection for higher weight at any age, especially after weaning, will produce an increase in mature cow weight. The possibility to modify the growth curve in Canchim cattle to obtain animals with rapid growth at early ages and moderate to low mature cow weight is limited.
Cardiovascular diseases and air pollution in Novi Sad, Serbia.

PubMed

Jevtić, Marija; Dragić, Nataša; Bijelović, Sanja; Popović, Milka

2014-04-01

A large body of evidence has documented that air pollutants have adverse effect on human health as well as on the environment. The aim of this study was to determine whether there was an association between outdoor concentrations of sulfur dioxide (SO2) and nitrogen dioxide (NO2) and a daily number of hospital admissions due to cardiovascular diseases (CVD) in Novi Sad, Serbia among patients aged above 18. The investigation was carried out during over a 3-year period (from January 1, 2007 to December 31, 2009) in the area of Novi Sad. The number (N = 10 469) of daily CVD (ICD-10: I00-I99) hospital admissions was collected according to patients' addresses. Daily mean levels of NO2 and SO2, measured in the ambient air of Novi Sad via a network of fixed samplers, have been used to put forward outdoor air pollution. Associations between air pollutants and hospital admissions were firstly analyzed by the use of the linear regression in a single polluted model, and then trough a single and multi-polluted adjusted generalized linear Poisson model. The single polluted model (without confounding factors) indicated that there was a linear increase in the number of hospital admissions due to CVD in relation to the linear increase in concentrations of SO2 (p = 0.015; 95% confidence interval (95% CI): 0.144-1.329, R(2) = 0.005) and NO2 (p = 0.007; 95% CI: 0.214-1.361, R(2) = 0.007). However, the single and multi-polluted adjusted models revealed that only NO2 was associated with the CVD (p = 0.016, relative risk (RR) = 1.049, 95% CI: 1.009-1.091 and p = 0.022, RR = 1.047, 95% CI: 1.007-1.089, respectively). This study shows a significant positive association between hospital admissions due to CVD and outdoor NO2 concentrations in the area of Novi Sad, Serbia.
Application of Hierarchical Linear Models/Linear Mixed-Effects Models in School Effectiveness Research

ERIC Educational Resources Information Center

Ker, H. W.

2014-01-01

Multilevel data are very common in educational research. Hierarchical linear models/linear mixed-effects models (HLMs/LMEs) are often utilized to analyze multilevel data nowadays. This paper discusses the problems of utilizing ordinary regressions for modeling multilevel educational data, compare the data analytic results from three regression…
Estimation of continuous multi-DOF finger joint kinematics from surface EMG using a multi-output Gaussian Process.

PubMed

Ngeo, Jimson; Tamei, Tomoya; Shibata, Tomohiro

2014-01-01

Surface electromyographic (EMG) signals have often been used in estimating upper and lower limb dynamics and kinematics for the purpose of controlling robotic devices such as robot prosthesis and finger exoskeletons. However, in estimating multiple and a high number of degrees-of-freedom (DOF) kinematics from EMG, output DOFs are usually estimated independently. In this study, we estimate finger joint kinematics from EMG signals using a multi-output convolved Gaussian Process (Multi-output Full GP) that considers dependencies between outputs. We show that estimation of finger joints from muscle activation inputs can be improved by using a regression model that considers inherent coupling or correlation within the hand and finger joints. We also provide a comparison of estimation performance between different regression methods, such as Artificial Neural Networks (ANN) which is used by many of the related studies. We show that using a multi-output GP gives improved estimation compared to multi-output ANN and even dedicated or independent regression models.
Mechanisms and kinetics models for ultrasonic waste activated sludge disintegration.

PubMed

Wang, Fen; Wang, Yong; Ji, Min

2005-08-31

Ultrasonic energy can be applied as pre-treatment to disintegrate sludge flocs and disrupt bacterial cells' walls, and the hydrolysis can be improved, so that the rate of sludge digestion and methane production is improved. In this paper, by adding NaHCO3 to mask the oxidizing effect of OH, the mechanisms of disintegration are investigated. In addition, kinetics models for ultrasonic sludge disintegration are established by applying multi-variable linear regression method. It has been found that hydro-mechanical shear forces predominantly responsible for the disintegration, and the contribution of oxidizing effect of OH increases with the amount of the ultrasonic density and ultrasonic intensity. It has also been inferred from the kinetics model which dependent variable is SCOD+ that both sludge pH and sludge concentration significantly affect the disintegration.
Prediction accuracies for growth and wood attributes of interior spruce in space using genotyping-by-sequencing.

PubMed

Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Chen, Charles; Porth, Ilga; El-Kassaby, Yousry A

2015-05-09

Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost. Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3. The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima.
Reference Models for Multi-Layer Tissue Structures

DTIC Science & Technology

2016-09-01

simulation, finite element analysis 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME OF RESPONSIBLE PERSON USAMRMC...Physiologically realistic, fully specimen-specific, nonlinear reference models. Tasks. Finite element analysis of non-linear mechanics of cadaver...models. Tasks. Finite element analysis of non-linear mechanics of multi-layer tissue regions of human subjects. Deliverables. Partially subject- and
Application of General Regression Neural Network to the Prediction of LOD Change

NASA Astrophysics Data System (ADS)

Zhang, Xiao-Hong; Wang, Qi-Jie; Zhu, Jian-Jun; Zhang, Hao

2012-01-01

Traditional methods for predicting the change in length of day (LOD change) are mainly based on some linear models, such as the least square model and autoregression model, etc. However, the LOD change comprises complicated non-linear factors and the prediction effect of the linear models is always not so ideal. Thus, a kind of non-linear neural network — general regression neural network (GRNN) model is tried to make the prediction of the LOD change and the result is compared with the predicted results obtained by taking advantage of the BP (back propagation) neural network model and other models. The comparison result shows that the application of the GRNN to the prediction of the LOD change is highly effective and feasible.
Application of third molar development and eruption models in estimating dental age in Malay sub-adults.

PubMed

Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc

2015-08-01

The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Modeling methodology for a CMOS-MEMS electrostatic comb

NASA Astrophysics Data System (ADS)

Iyer, Sitaraman V.; Lakdawala, Hasnain; Mukherjee, Tamal; Fedder, Gary K.

2002-04-01

A methodology for combined modeling of capacitance and force 9in a multi-layer electrostatic comb is demonstrated in this paper. Conformal mapping-based analytical methods are limited to 2D symmetric cross-sections and cannot account for charge concentration effects at corners. Vertex capacitance can be more than 30% of the total capacitance in a single-layer 2 micrometers thick comb with 10 micrometers overlap. Furthermore, analytical equations are strictly valid only for perfectly symmetrical finger positions. Fringing and corner effects are likely to be more significant in a multi- layered CMOS-MEMS comb because of the presence of more edges and vertices. Vertical curling of CMOS-MEMS comb fingers may also lead to reduced capacitance and vertical forces. Gyroscopes are particularly sensitive to such undesirable forces, which therefore, need to be well-quantified. In order to address the above issues, a hybrid approach of superposing linear regression models over a set of core analytical models is implemented. Design of experiments is used to obtain data for capacitance and force using a commercial 3D boundary-element solver. Since accurate force values require significantly higher mesh refinement than accurate capacitance, we use numerical derivatives of capacitance values to compute the forces. The model is formulated such that the capacitance and force models use the same regression coefficients. The comb model thus obtained, fits the numerical capacitance data to within +/- 3% and force to within +/- 10%. The model is experimentally verified by measuring capacitance change in a specially designed test structure. The capacitance model matches measurements to within 10%. The comb model is implemented in an Analog Hardware Description Language (ADHL) for use in behavioral simulation of manufacturing variations in a CMOS-MEMS gyroscope.
A novel multi-target regression framework for time-series prediction of drug efficacy.

PubMed

Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin

2017-01-18

Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task.
A novel multi-target regression framework for time-series prediction of drug efficacy

PubMed Central

Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin

2017-01-01

Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task. PMID:28098186
Simple linear and multivariate regression models.

PubMed

Rodríguez del Águila, M M; Benítez-Parejo, N

2011-01-01

In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.

RBF kernel based support vector regression to estimate the blood volume and heart rate responses during hemodialysis.

PubMed

Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H

2009-01-01

This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
Use of probabilistic weights to enhance linear regression myoelectric control

NASA Astrophysics Data System (ADS)

Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.

2015-12-01

Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
CO2 flux determination by closed-chamber methods can be seriously biased by inappropriate application of linear regression

NASA Astrophysics Data System (ADS)

Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.

2007-11-01

Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Monitoring Springs in the Mojave Desert Using Landsat Time Series Analysis

NASA Technical Reports Server (NTRS)

Potter, Christopher S.

2018-01-01

The purpose of this study, based on Landsat satellite data was to characterize variations and trends over 30 consecutive years (1985-2016) in perennial vegetation green cover at over 400 confirmed Mojave Desert spring locations. These springs were surveyed between in 2015 and 2016 on lands managed in California by the U.S. Bureau of Land Management (BLM) and on several land trusts within the Barstow, Needles, and Ridgecrest BLM Field Offices. The normalized difference vegetation index (NDVI) from July Landsat images was computed at each spring location and a trend model was first fit to the multi-year NDVI time series using least squares linear regression.Â
Liver fat contents, abdominal adiposity and insulin resistance in non-diabetic prevalent hemodialysis patients.

PubMed

Chen, Hung-Yuan; Lin, Chien-Chu; Chiu, Yen-Ling; Hsu, Shih-Ping; Pai, Mei-Fen; Yang, Ju-Yeh; Wu, Hon-Yen; Peng, Yu-Sen

2014-01-01

The liver fat contents and abdominal adiposity correlate well with insulin resistance (IR) in the general population. However, the relationship between liver fat content, abdominal adiposity and IR in non-diabetic hemodialysis (HD) patients remains unclear. This study aimed to clarify the associations among these factors. This is a cross-sectional, observational study. All patients received abdominal ultrasound for liver fat content. Abdominal adiposity was quantified with the conicity index (Ci) and waist circumference (WC). We checked the homeostasis model assessment for insulin resistance index (HOMA-IR) for IR. A total of 112 patients (60 women) were analyzed. Subjects with higher liver fat contents and WC had higher IR indices. But Ci did not correlate with IR indices. In both the multi-variable linear regression model and the logistic regression model, only higher liver fat content predicted a severe IR status. Liver fat contents have a remarkable correlation with IR; however, abdominal adiposity, measured either by Ci or WC, dose not independently correlate with IR in non-diabetic prevalent HD patients. © 2014 S. Karger AG, Basel.
An Evaluation of the Automated Cost Estimating Integrated Tools (ACEIT) System

DTIC Science & Technology

1989-09-01

residual and it is described as the residual divided by its standard deviation (13:App A,17). Neter, Wasserman, and Kutner, in Applied Linear Regression Models...others. Applied Linear Regression Models. Homewood IL: Irwin, 1983. 19. Raduchel, William J. "A Professional’s Perspective on User-Friendliness," Byte
Conjoint Analysis: A Study of the Effects of Using Person Variables.

ERIC Educational Resources Information Center

Fraas, John W.; Newman, Isadore

Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…
Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

NASA Technical Reports Server (NTRS)

MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

2005-01-01

Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Random regression models using different functions to model test-day milk yield of Brazilian Holstein cows.

PubMed

Bignardi, A B; El Faro, L; Torres Júnior, R A A; Cardoso, V L; Machado, P F; Albuquerque, L G

2011-10-31

We analyzed 152,145 test-day records from 7317 first lactations of Holstein cows recorded from 1995 to 2003. Our objective was to model variations in test-day milk yield during the first lactation of Holstein cows by random regression model (RRM), using various functions in order to obtain adequate and parsimonious models for the estimation of genetic parameters. Test-day milk yields were grouped into weekly classes of days in milk, ranging from 1 to 44 weeks. The contemporary groups were defined as herd-test-day. The analyses were performed using a single-trait RRM, including the direct additive, permanent environmental and residual random effects. In addition, contemporary group and linear and quadratic effects of the age of cow at calving were included as fixed effects. The mean trend of milk yield was modeled with a fourth-order orthogonal Legendre polynomial. The additive genetic and permanent environmental covariance functions were estimated by random regression on two parametric functions, Ali and Schaeffer and Wilmink, and on B-spline functions of days in milk. The covariance components and the genetic parameters were estimated by the restricted maximum likelihood method. Results from RRM parametric and B-spline functions were compared to RRM on Legendre polynomials and with a multi-trait analysis, using the same data set. Heritability estimates presented similar trends during mid-lactation (13 to 31 weeks) and between week 37 and the end of lactation, for all RRM. Heritabilities obtained by multi-trait analysis were of a lower magnitude than those estimated by RRM. The RRMs with a higher number of parameters were more useful to describe the genetic variation of test-day milk yield throughout the lactation. RRM using B-spline and Legendre polynomials as base functions appears to be the most adequate to describe the covariance structure of the data.
Comparison of linear and non-linear models for predicting energy expenditure from raw accelerometer data.

PubMed

Montoye, Alexander H K; Begum, Munni; Henning, Zachary; Pfeiffer, Karin A

2017-02-01

This study had three purposes, all related to evaluating energy expenditure (EE) prediction accuracy from body-worn accelerometers: (1) compare linear regression to linear mixed models, (2) compare linear models to artificial neural network models, and (3) compare accuracy of accelerometers placed on the hip, thigh, and wrists. Forty individuals performed 13 activities in a 90 min semi-structured, laboratory-based protocol. Participants wore accelerometers on the right hip, right thigh, and both wrists and a portable metabolic analyzer (EE criterion). Four EE prediction models were developed for each accelerometer: linear regression, linear mixed, and two ANN models. EE prediction accuracy was assessed using correlations, root mean square error (RMSE), and bias and was compared across models and accelerometers using repeated-measures analysis of variance. For all accelerometer placements, there were no significant differences for correlations or RMSE between linear regression and linear mixed models (correlations: r = 0.71-0.88, RMSE: 1.11-1.61 METs; p > 0.05). For the thigh-worn accelerometer, there were no differences in correlations or RMSE between linear and ANN models (ANN-correlations: r = 0.89, RMSE: 1.07-1.08 METs. Linear models-correlations: r = 0.88, RMSE: 1.10-1.11 METs; p > 0.05). Conversely, one ANN had higher correlations and lower RMSE than both linear models for the hip (ANN-correlation: r = 0.88, RMSE: 1.12 METs. Linear models-correlations: r = 0.86, RMSE: 1.18-1.19 METs; p < 0.05), and both ANNs had higher correlations and lower RMSE than both linear models for the wrist-worn accelerometers (ANN-correlations: r = 0.82-0.84, RMSE: 1.26-1.32 METs. Linear models-correlations: r = 0.71-0.73, RMSE: 1.55-1.61 METs; p < 0.01). For studies using wrist-worn accelerometers, machine learning models offer a significant improvement in EE prediction accuracy over linear models. Conversely, linear models showed similar EE prediction accuracy to machine learning models for hip- and thigh-worn accelerometers and may be viable alternative modeling techniques for EE prediction for hip- or thigh-worn accelerometers.
Testing hypotheses for differences between linear regression lines

Treesearch

Stanley J. Zarnoch

2009-01-01

Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
Spatial Assessment of Model Errors from Four Regression Techniques

Treesearch

Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove

2005-01-01

Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
Optimization of isotherm models for pesticide sorption on biopolymer-nanoclay composite by error analysis.

PubMed

Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M

2017-04-01

A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.
Combined analysis of magnetic and gravity anomalies using normalized source strength (NSS)

NASA Astrophysics Data System (ADS)

Li, L.; Wu, Y.

2017-12-01

Gravity field and magnetic field belong to potential fields which lead inherent multi-solution. Combined analysis of magnetic and gravity anomalies based on Poisson's relation is used to determinate homology gravity and magnetic anomalies and decrease the ambiguity. The traditional combined analysis uses the linear regression of the reduction to pole (RTP) magnetic anomaly to the first order vertical derivative of the gravity anomaly, and provides the quantitative or semi-quantitative interpretation by calculating the correlation coefficient, slope and intercept. In the calculation process, due to the effect of remanent magnetization, the RTP anomaly still contains the effect of oblique magnetization. In this case the homology gravity and magnetic anomalies display irrelevant results in the linear regression calculation. The normalized source strength (NSS) can be transformed from the magnetic tensor matrix, which is insensitive to the remanence. Here we present a new combined analysis using NSS. Based on the Poisson's relation, the gravity tensor matrix can be transformed into the pseudomagnetic tensor matrix of the direction of geomagnetic field magnetization under the homologous condition. The NSS of pseudomagnetic tensor matrix and original magnetic tensor matrix are calculated and linear regression analysis is carried out. The calculated correlation coefficient, slope and intercept indicate the homology level, Poisson's ratio and the distribution of remanent respectively. We test the approach using synthetic model under complex magnetization, the results show that it can still distinguish the same source under the condition of strong remanence, and establish the Poisson's ratio. Finally, this approach is applied in China. The results demonstrated that our approach is feasible.
A Model Comparison for Count Data with a Positively Skewed Distribution with an Application to the Number of University Mathematics Courses Completed

ERIC Educational Resources Information Center

Liou, Pey-Yan

2009-01-01

The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…
Unit Cohesion and the Surface Navy: Does Cohesion Affect Performance

DTIC Science & Technology

1989-12-01

v. 68, 1968. Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. Rand Corporation R-2607...Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. SAS User’s Guide: Basics, Version 5 ed
What Is Wrong with ANOVA and Multiple Regression? Analyzing Sentence Reading Times with Hierarchical Linear Models

ERIC Educational Resources Information Center

Richter, Tobias

2006-01-01

Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…
Optimal selection of MULTI-model downscaled ensembles for interannual and seasonal climate prediction in the eastern seaboard of Thailand

NASA Astrophysics Data System (ADS)

Bejranonda, W.; Koch, M.

2010-12-01

Because of the imminent threat of the water resources of the eastern seaboard of Thailand, a climate impact study has been carried out there. To that avail, a hydrological watershed model is being used to simulate the future water availability in the wake of possible climate change in the region. The hydrological model is forced by predictions from global climate models (GCMs) that are to be downscaled in an appropriate manner. The challenge at that stage of the climate impact analysis lies then the in the choice of the best GCM and the (statistical) downscaling method. In this study the selection of coarse grid resolution output of the GCMs, transferring information to the fine grid of local climate-hydrology is achieved by cross-correlation and multiple linear regression using meteorological data in the eastern seaboard of Thailand observed between 1970-1999. The grids of 20 atmosphere/ocean global climate models (AOGCM), covering latitude 12.5-15.0 N and longitude 100.0-102.5 E were examined using the Climate-Change Scenario Generator (SCENGEN). With that tool the model efficiency of the prediction of daily precipitation and mean temperature was calculated by comparing the 1980-1999 ECMWF reanalysis predictions with the observed data during that time period. The root means square errors of the predictions were considered and ranked to select the top 5 models, namely, BCCR-BCM2.0, GISS-ER, ECHO-G, ECHAM5/MPI-OM and PCM. The daily time-series of 338 predictors in 9 runs of the 5 selected models were gathered from the CMIP3 multi-model database. Monthly time-serial cross-correlations between the climate predictors and the meteorological measurements from 25 rainfall, 4 minimum and maximum temperature, 4 humidity and 2 solar radiation stations in the study area were then computed and ranked. Using the ranked predictors, a multiple-linear regression model (downscaling transfer model) to forecast the local climate was set up. To improve the prediction power of this GCM downscaling approach, the regression equations were considered as a dynamic regression model that can alter the predictor by seasonal variation. The possible seasonal effect was examined for the 1974-1999 period which was equally divided into a calibration and verification sub-period. The calibrated model using the whole observed time-series was compared with the models separated into 2 seasons; dry and wet, 3 seasons; winter, summer and rainy, and 4 seasons; dry, pre-monsoon, first monsoon and second monsoon. The verification power of the various model variants was measured considering Akaike's information criterion (AIC) and the Nash-Sutcliffe coefficient of the corresponding model fit. The results show that the 4-seasons-variation prediction works best. The highest efficiency for the prediction of rainfall is achieved for the dry season, Oct-Mar, whereas the smallest efficiency is obtained in the monsoon seasons. The overall number of predictor giving top efficiency lies between 3 and 20 in the regression models. In the next, still ongoing stage of the climate impact study the predictions from this new, seasonally optimized downscaling transfer model are being used in the simulations of the future hydrological water budget in that region of Thailand.
A Comparison between Multiple Regression Models and CUN-BAE Equation to Predict Body Fat in Adults

PubMed Central

Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A.; Aguiló, Antoni

2015-01-01

Background Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Methods Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. Results The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). Conclusions There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF. PMID:25821960
A comparison between multiple regression models and CUN-BAE equation to predict body fat in adults.

PubMed

Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A; Aguiló, Antoni

2015-01-01

Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF.

Multivariate decoding of brain images using ordinal regression.

PubMed

Doyle, O M; Ashburner, J; Zelaya, F O; Williams, S C R; Mehta, M A; Marquand, A F

2013-11-01

Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations - whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds - lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection. Copyright © 2013. Published by Elsevier Inc.
Multiresponse semiparametric regression for modelling the effect of regional socio-economic variables on the use of information technology

NASA Astrophysics Data System (ADS)

Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania

2017-03-01

Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Linearity versus Nonlinearity of Offspring-Parent Regression: An Experimental Study of Drosophila Melanogaster

PubMed Central

Gimelfarb, A.; Willis, J. H.

1994-01-01

An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818
Enhancing oil removal from water by immobilizing multi-wall carbon nanotubes on the surface of polyurethane foam.

PubMed

Keshavarz, Alireza; Zilouei, Hamid; Abdolmaleki, Amir; Asadinezhad, Ahmad

2015-07-01

A surface modification method was carried out to enhance the light crude oil sorption capacity of polyurethane foam (PUF) through immobilization of multi-walled carbon nanotube (MWCNT) on the foam surface at various concentrations. The developed sorbent was characterized using scanning electron microscopy, Fourier transform infrared spectroscopy, thermogravimetric analysis, and tensile elongation test. The results obtained from thermogravimetric and tensile elongation tests showed the improvement of thermal and mechanical resistance of surface-modified foam. The experimental data also revealed that the immobilization of MWCNT on PUF surface enhanced the sorption capacity of light crude oil and reduced water sorption. The highest oil removal capacity was obtained for 1 wt% MWCNT on PUF surface which was 21.44% enhancement in light crude oil sorption compared to the blank PUF. The reusability of surface modified PUF was determined through four cycles of chemical regeneration using petroleum ether. The adsorption of light crude oil with 30 g initial mass showed that 85.45% of the initial oil sorption capacity of this modified sorbent was remained after four regeneration cycles. Equilibrium isotherms for adsorption of oil were analyzed by the Freundlich, Langmuir, Temkin, and Redlich-Peterson models through linear and non-linear regression methods. Results of equilibrium revealed that Langmuir isotherm is the best fitting model and non-linear method is a more accurate way to predict the parameters involved in the isotherms. The overall findings suggested the promising potentials of the developed sorbent in order to be efficiently used in large-scale oil spill cleanup. Copyright © 2015 Elsevier Ltd. All rights reserved.
Neural network modeling for surgical decisions on traumatic brain injury patients.

PubMed

Li, Y C; Liu, L; Chiu, W T; Jian, W S

2000-01-01

Computerized medical decision support systems have been a major research topic in recent years. Intelligent computer programs were implemented to aid physicians and other medical professionals in making difficult medical decisions. This report compares three different mathematical models for building a traumatic brain injury (TBI) medical decision support system (MDSS). These models were developed based on a large TBI patient database. This MDSS accepts a set of patient data such as the types of skull fracture, Glasgow Coma Scale (GCS), episode of convulsion and return the chance that a neurosurgeon would recommend an open-skull surgery for this patient. The three mathematical models described in this report including a logistic regression model, a multi-layer perceptron (MLP) neural network and a radial-basis-function (RBF) neural network. From the 12,640 patients selected from the database. A randomly drawn 9480 cases were used as the training group to develop/train our models. The other 3160 cases were in the validation group which we used to evaluate the performance of these models. We used sensitivity, specificity, areas under receiver-operating characteristics (ROC) curve and calibration curves as the indicator of how accurate these models are in predicting a neurosurgeon's decision on open-skull surgery. The results showed that, assuming equal importance of sensitivity and specificity, the logistic regression model had a (sensitivity, specificity) of (73%, 68%), compared to (80%, 80%) from the RBF model and (88%, 80%) from the MLP model. The resultant areas under ROC curve for logistic regression, RBF and MLP neural networks are 0.761, 0.880 and 0.897, respectively (P < 0.05). Among these models, the logistic regression has noticeably poorer calibration. This study demonstrated the feasibility of applying neural networks as the mechanism for TBI decision support systems based on clinical databases. The results also suggest that neural networks may be a better solution for complex, non-linear medical decision support systems than conventional statistical techniques such as logistic regression.
Analysis and prediction of flow from local source in a river basin using a Neuro-fuzzy modeling tool.

PubMed

Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi

2007-10-01

Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.
An hourly PM10 diagnosis model for the Bilbao metropolitan area using a linear regression methodology.

PubMed

González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O

2013-07-01

There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods.
Regression analysis using dependent Polya trees.

PubMed

Schörgendorfer, Angela; Branscum, Adam J

2013-11-30

Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
A novel simple QSAR model for the prediction of anti-HIV activity using multiple linear regression analysis.

PubMed

Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga

2006-08-01

A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Hypothesis testing in functional linear regression models with Neyman's truncation and wavelet thresholding for longitudinal data.

PubMed

Yang, Xiaowei; Nie, Kun

2008-03-15

Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.
High-order fuzzy time-series based on multi-period adaptation model for forecasting stock markets

NASA Astrophysics Data System (ADS)

Chen, Tai-Liang; Cheng, Ching-Hsue; Teoh, Hia-Jong

2008-02-01

Stock investors usually make their short-term investment decisions according to recent stock information such as the late market news, technical analysis reports, and price fluctuations. To reflect these short-term factors which impact stock price, this paper proposes a comprehensive fuzzy time-series, which factors linear relationships between recent periods of stock prices and fuzzy logical relationships (nonlinear relationships) mined from time-series into forecasting processes. In empirical analysis, the TAIEX (Taiwan Stock Exchange Capitalization Weighted Stock Index) and HSI (Heng Seng Index) are employed as experimental datasets, and four recent fuzzy time-series models, Chen’s (1996), Yu’s (2005), Cheng’s (2006) and Chen’s (2007), are used as comparison models. Besides, to compare with conventional statistic method, the method of least squares is utilized to estimate the auto-regressive models of the testing periods within the databases. From analysis results, the performance comparisons indicate that the multi-period adaptation model, proposed in this paper, can effectively improve the forecasting performance of conventional fuzzy time-series models which only factor fuzzy logical relationships in forecasting processes. From the empirical study, the traditional statistic method and the proposed model both reveal that stock price patterns in the Taiwan stock and Hong Kong stock markets are short-term.
Automated retrieval of forest structure variables based on multi-scale texture analysis of VHR satellite imagery

NASA Astrophysics Data System (ADS)

Beguet, Benoit; Guyon, Dominique; Boukir, Samia; Chehata, Nesrine

2014-10-01

The main goal of this study is to design a method to describe the structure of forest stands from Very High Resolution satellite imagery, relying on some typical variables such as crown diameter, tree height, trunk diameter, tree density and tree spacing. The emphasis is placed on the automatization of the process of identification of the most relevant image features for the forest structure retrieval task, exploiting both spectral and spatial information. Our approach is based on linear regressions between the forest structure variables to be estimated and various spectral and Haralick's texture features. The main drawback of this well-known texture representation is the underlying parameters which are extremely difficult to set due to the spatial complexity of the forest structure. To tackle this major issue, an automated feature selection process is proposed which is based on statistical modeling, exploring a wide range of parameter values. It provides texture measures of diverse spatial parameters hence implicitly inducing a multi-scale texture analysis. A new feature selection technique, we called Random PRiF, is proposed. It relies on random sampling in feature space, carefully addresses the multicollinearity issue in multiple-linear regression while ensuring accurate prediction of forest variables. Our automated forest variable estimation scheme was tested on Quickbird and Pléiades panchromatic and multispectral images, acquired at different periods on the maritime pine stands of two sites in South-Western France. It outperforms two well-established variable subset selection techniques. It has been successfully applied to identify the best texture features in modeling the five considered forest structure variables. The RMSE of all predicted forest variables is improved by combining multispectral and panchromatic texture features, with various parameterizations, highlighting the potential of a multi-resolution approach for retrieving forest structure variables from VHR satellite images. Thus an average prediction error of ˜ 1.1 m is expected on crown diameter, ˜ 0.9 m on tree spacing, ˜ 3 m on height and ˜ 0.06 m on diameter at breast height.
Local hyperspectral data multisharpening based on linear/linear-quadratic nonnegative matrix factorization by integrating lidar data

NASA Astrophysics Data System (ADS)

Benhalouche, Fatima Zohra; Karoui, Moussa Sofiane; Deville, Yannick; Ouamri, Abdelaziz

2015-10-01

In this paper, a new Spectral-Unmixing-based approach, using Nonnegative Matrix Factorization (NMF), is proposed to locally multi-sharpen hyperspectral data by integrating a Digital Surface Model (DSM) obtained from LIDAR data. In this new approach, the nature of the local mixing model is detected by using the local variance of the object elevations. The hyper/multispectral images are explored using small zones. In each zone, the variance of the object elevations is calculated from the DSM data in this zone. This variance is compared to a threshold value and the adequate linear/linearquadratic spectral unmixing technique is used in the considered zone to independently unmix hyperspectral and multispectral data, using an adequate linear/linear-quadratic NMF-based approach. The obtained spectral and spatial information thus respectively extracted from the hyper/multispectral images are then recombined in the considered zone, according to the selected mixing model. Experiments based on synthetic hyper/multispectral data are carried out to evaluate the performance of the proposed multi-sharpening approach and literature linear/linear-quadratic approaches used on the whole hyper/multispectral data. In these experiments, real DSM data are used to generate synthetic data containing linear and linear-quadratic mixed pixel zones. The DSM data are also used for locally detecting the nature of the mixing model in the proposed approach. Globally, the proposed approach yields good spatial and spectral fidelities for the multi-sharpened data and significantly outperforms the used literature methods.
Multi-water-bag models of ion temperature gradient instability in cylindrical geometry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Coulette, David; Besse, Nicolas

2013-05-15

Ion temperature gradient instabilities play a major role in the understanding of anomalous transport in core fusion plasmas. In the considered cylindrical geometry, ion dynamics is described using a drift-kinetic multi-water-bag model for the parallel velocity dependency of the ion distribution function. In a first stage, global linear stability analysis is performed. From the obtained normal modes, parametric dependencies of the main spectral characteristics of the instability are then examined. Comparison of the multi-water-bag results with a reference continuous Maxwellian case allows us to evaluate the effects of discrete parallel velocity sampling induced by the Multi-Water-Bag model. Differences between themore » global model and local models considered in previous works are discussed. Using results from linear, quasilinear, and nonlinear numerical simulations, an analysis of the first stage saturation dynamics of the instability is proposed, where the divergence between the three models is examined.« less
Estimating Top-of-Atmosphere Thermal Infrared Radiance Using MERRA-2 Atmospheric Data

NASA Astrophysics Data System (ADS)

Kleynhans, Tania

Space borne thermal infrared sensors have been extensively used for environmental research as well as cross-calibration of other thermal sensing systems. Thermal infrared data from satellites such as Landsat and Terra/MODIS have limited temporal resolution (with a repeat cycle of 1 to 2 days for Terra/MODIS, and 16 days for Landsat). Thermal instruments with finer temporal resolution on geostationary satellites have limited utility for cross-calibration due to their large view angles. Reanalysis atmospheric data is available on a global spatial grid at three hour intervals making it a potential alternative to existing satellite image data. This research explores using the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) reanalysis data product to predict top-of-atmosphere (TOA) thermal infrared radiance globally at time scales finer than available satellite data. The MERRA-2 data product provides global atmospheric data every three hours from 1980 to the present. Due to the high temporal resolution of the MERRA-2 data product, opportunities for novel research and applications are presented. While MERRA-2 has been used in renewable energy and hydrological studies, this work seeks to leverage the model to predict TOA thermal radiance. Two approaches have been followed, namely physics-based approach and a supervised learning approach, using Terra/MODIS band 31 thermal infrared data as reference. The first physics-based model uses forward modeling to predict TOA thermal radiance. The second model infers the presence of clouds from the MERRA-2 atmospheric data, before applying an atmospheric radiative transfer model. The last physics-based model parameterized the previous model to minimize computation time. The second approach applied four different supervised learning algorithms to the atmospheric data. The algorithms included a linear least squares regression model, a non-linear support vector regression (SVR) model, a multi-layer perceptron (MLP), and a convolutional neural network (CNN). This research found that the multi-layer perceptron model produced the lowest error rates overall, with an RMSE of 1.22W / m2 sr mum when compared to actual Terra/MODIS band 31 image data. This research further aimed to characterize the errors associated with each method so that any potential user will have the best information available should they wish to apply these methods towards their own application.
An evaluation of bias in propensity score-adjusted non-linear regression models.

PubMed

Wan, Fei; Mitra, Nandita

2018-03-01

Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.
CO2 flux determination by closed-chamber methods can be seriously biased by inappropriate application of linear regression

NASA Astrophysics Data System (ADS)

Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.

2007-07-01

Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Stratification for the propensity score compared with linear regression techniques to assess the effect of treatment or exposure.

PubMed

Senn, Stephen; Graf, Erika; Caputo, Angelika

2007-12-30

Stratifying and matching by the propensity score are increasingly popular approaches to deal with confounding in medical studies investigating effects of a treatment or exposure. A more traditional alternative technique is the direct adjustment for confounding in regression models. This paper discusses fundamental differences between the two approaches, with a focus on linear regression and propensity score stratification, and identifies points to be considered for an adequate comparison. The treatment estimators are examined for unbiasedness and efficiency. This is illustrated in an application to real data and supplemented by an investigation on properties of the estimators for a range of underlying linear models. We demonstrate that in specific circumstances the propensity score estimator is identical to the effect estimated from a full linear model, even if it is built on coarser covariate strata than the linear model. As a consequence the coarsening property of the propensity score-adjustment for a one-dimensional confounder instead of a high-dimensional covariate-may be viewed as a way to implement a pre-specified, richly parametrized linear model. We conclude that the propensity score estimator inherits the potential for overfitting and that care should be taken to restrict covariates to those relevant for outcome. Copyright (c) 2007 John Wiley & Sons, Ltd.
Multi-variant study of obesity risk genes in African Americans: The Jackson Heart Study.

PubMed

Liu, Shijian; Wilson, James G; Jiang, Fan; Griswold, Michael; Correa, Adolfo; Mei, Hao

2016-11-30

Genome-wide association study (GWAS) has been successful in identifying obesity risk genes by single-variant association analysis. For this study, we designed steps of analysis strategy and aimed to identify multi-variant effects on obesity risk among candidate genes. Our analyses were focused on 2137 African American participants with body mass index measured in the Jackson Heart Study and 657 common single nucleotide polymorphisms (SNPs) genotyped at 8 GWAS-identified obesity risk genes. Single-variant association test showed that no SNPs reached significance after multiple testing adjustment. The following gene-gene interaction analysis, which was focused on SNPs with unadjusted p-value<0.10, identified 6 significant multi-variant associations. Logistic regression showed that SNPs in these associations did not have significant linear interactions; examination of genetic risk score evidenced that 4 multi-variant associations had significant additive effects of risk SNPs; and haplotype association test presented that all multi-variant associations contained one or several combinations of particular alleles or haplotypes, associated with increased obesity risk. Our study evidenced that obesity risk genes generated multi-variant effects, which can be additive or non-linear interactions, and multi-variant study is an important supplement to existing GWAS for understanding genetic effects of obesity risk genes. Copyright © 2016 Elsevier B.V. All rights reserved.
An Analysis of Terrestrial and Aquatic Environmental Controls of Riverine Dissolved Organic Carbon in the Conterminous United States

DOE PAGES

Yang, Qichun; Zhang, Xuesong; Xu, Xingya; ...

2017-05-29

Riverine carbon cycling is an important, but insufficiently investigated component of the global carbon cycle. Analyses of environmental controls on riverine carbon cycling are critical for improved understanding of mechanisms regulating carbon processing and storage along the terrestrial-aquatic continuum. Here, we compile and analyze riverine dissolved organic carbon (DOC) concentration data from 1402 United States Geological Survey (USGS) gauge stations to examine the spatial variability and environmental controls of DOC concentrations in the United States (U.S.) surface waters. DOC concentrations exhibit high spatial variability, with an average of 6.42 ± 6.47 mg C/ L (Mean ± Standard Deviation). In general,more » high DOC concentrations occur in the Upper Mississippi River basin and the Southeastern U.S., while low concentrations are mainly distributed in the Western U.S. Single-factor analysis indicates that slope of drainage areas, wetlands, forests, percentage of first-order streams, and instream nutrients (such as nitrogen and phosphorus) pronouncedly influence DOC concentrations, but the explanatory power of each bivariate model is lower than 35%. Analyses based on the general multi-linear regression models suggest DOC concentrations are jointly impacted by multiple factors. Soil properties mainly show positive correlations with DOC concentrations; forest and shrub lands have positive correlations with DOC concentrations, but urban area and croplands demonstrate negative impacts; total instream phosphorus and dam density correlate positively with DOC concentrations. Notably, the relative importance of these environmental controls varies substantially across major U.S. water resource regions. In addition, DOC concentrations and environmental controls also show significant variability from small streams to large rivers, which may be caused by changing carbon sources and removal rates by river orders. In sum, our results reveal that general multi-linear regression analysis of twenty one terrestrial and aquatic environmental factors can partially explain (56%) the DOC concentration variation. In conclusion, this study highlights the complexity of the interactions among these environmental factors in determining DOC concentrations, thus calls for processes-based, non-linear methodologies to constrain uncertainties in riverine DOC cycling.« less

Bayesian quantile regression-based partially linear mixed-effects joint models for longitudinal data with multiple features.

PubMed

Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara

2017-01-01

In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.
Quantitative structure-retention relationship studies for taxanes including epimers and isomeric metabolites in ultra fast liquid chromatography.

PubMed

Dong, Pei-Pei; Ge, Guang-Bo; Zhang, Yan-Yan; Ai, Chun-Zhi; Li, Guo-Hui; Zhu, Liang-Liang; Luan, Hong-Wei; Liu, Xing-Bao; Yang, Ling

2009-10-16

Seven pairs of epimers and one pair of isomeric metabolites of taxanes, each pair of which have similar structures but different retention behaviors, together with additional 13 taxanes with different substitutions were chosen to investigate the quantitative structure-retention relationship (QSRR) of taxanes in ultra fast liquid chromatography (UFLC). Monte Carlo variable selection (MCVS) method was adopted to choose descriptors. The selected four descriptors were used to build QSRR model with multi-linear regression (MLR) and artificial neural network (ANN) modeling techniques. Both linear and nonlinear models show good predictive ability, of which ANN model was better with the determination coefficient R(2) for training, validation and test set being 0.9892, 0.9747 and 0.9840, respectively. The results of 100 times' leave-12-out cross validation showed the robustness of this model. All the isomers can be correctly differentiated by this model. According to the selected descriptors, the three dimensional structural information was critical for recognition of epimers. Hydrophobic interaction was the uppermost factor for retention in UFLC. Molecules' polarizability and polarity properties were also closely correlated with retention behaviors. This QSRR model will be useful for separation and identification of taxanes including epimers and metabolites from botanical or biological samples.
Determining Predictor Importance in Hierarchical Linear Models Using Dominance Analysis

ERIC Educational Resources Information Center

Luo, Wen; Azen, Razia

2013-01-01

Dominance analysis (DA) is a method used to evaluate the relative importance of predictors that was originally proposed for linear regression models. This article proposes an extension of DA that allows researchers to determine the relative importance of predictors in hierarchical linear models (HLM). Commonly used measures of model adequacy in…
Quantile Regression in the Study of Developmental Sciences

PubMed Central

Petscher, Yaacov; Logan, Jessica A. R.

2014-01-01

Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstrated and contrasted with linear regression when considering models with: (a) one continuous predictor, (b) one dichotomous predictor, (c) a continuous and a dichotomous predictor, and (d) a longitudinal application. Results from each example exhibited the differential inferences which may be drawn using linear or quantile regression. PMID:24329596
Modelling daily water temperature from air temperature for the Missouri River.

PubMed

Zhu, Senlin; Nyarko, Emmanuel Karlo; Hadzima-Nyarko, Marijana

2018-01-01

The bio-chemical and physical characteristics of a river are directly affected by water temperature, which thereby affects the overall health of aquatic ecosystems. It is a complex problem to accurately estimate water temperature. Modelling of river water temperature is usually based on a suitable mathematical model and field measurements of various atmospheric factors. In this article, the air-water temperature relationship of the Missouri River is investigated by developing three different machine learning models (Artificial Neural Network (ANN), Gaussian Process Regression (GPR), and Bootstrap Aggregated Decision Trees (BA-DT)). Standard models (linear regression, non-linear regression, and stochastic models) are also developed and compared to machine learning models. Analyzing the three standard models, the stochastic model clearly outperforms the standard linear model and nonlinear model. All the three machine learning models have comparable results and outperform the stochastic model, with GPR having slightly better results for stations No. 2 and 3, while BA-DT has slightly better results for station No. 1. The machine learning models are very effective tools which can be used for the prediction of daily river temperature.
Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

PubMed

Madarang, Krish J; Kang, Joo-Hyon

2014-06-01

Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Predicting U.S. Army Reserve Unit Manning Using Market Demographics

DTIC Science & Technology

2015-06-01

develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
Multi-Parameter Linear Least-Squares Fitting to Poisson Data One Count at a Time

NASA Technical Reports Server (NTRS)

Wheaton, W.; Dunklee, A.; Jacobson, A.; Ling, J.; Mahoney, W.; Radocinski, R.

1993-01-01

A standard problem in gamma-ray astronomy data analysis is the decomposition of a set of observed counts, described by Poisson statistics, according to a given multi-component linear model, with underlying physical count rates or fluxes which are to be estimated from the data.
Regression assumptions in clinical psychology research practice-a systematic review of common misconceptions.

PubMed

Ernst, Anja F; Albers, Casper J

2017-01-01

Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking.
Regression assumptions in clinical psychology research practice—a systematic review of common misconceptions

PubMed Central

Ernst, Anja F.

2017-01-01

Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking. PMID:28533971
The role of zonal flows in the saturation of multi-scale gyrokinetic turbulence

DOE Office of Scientific and Technical Information (OSTI.GOV)

Staebler, G. M.; Candy, J.; Howard, N. T.

2016-06-15

The 2D spectrum of the saturated electric potential from gyrokinetic turbulence simulations that include both ion and electron scales (multi-scale) in axisymmetric tokamak geometry is analyzed. The paradigm that the turbulence is saturated when the zonal (axisymmetic) ExB flow shearing rate competes with linear growth is shown to not apply to the electron scale turbulence. Instead, it is the mixing rate by the zonal ExB velocity spectrum with the turbulent distribution function that competes with linear growth. A model of this mechanism is shown to be able to capture the suppression of electron-scale turbulence by ion-scale turbulence and the thresholdmore » for the increase in electron scale turbulence when the ion-scale turbulence is reduced. The model computes the strength of the zonal flow velocity and the saturated potential spectrum from the linear growth rate spectrum. The model for the saturated electric potential spectrum is applied to a quasilinear transport model and shown to accurately reproduce the electron and ion energy fluxes of the non-linear gyrokinetic multi-scale simulations. The zonal flow mixing saturation model is also shown to reproduce the non-linear upshift in the critical temperature gradient caused by zonal flows in ion-scale gyrokinetic simulations.« less
The role of zonal flows in the saturation of multi-scale gyrokinetic turbulence

DOE PAGES

Staebler, Gary M.; Candy, John; Howard, Nathan T.; ...

2016-06-29

The 2D spectrum of the saturated electric potential from gyrokinetic turbulence simulations that include both ion and electron scales (multi-scale) in axisymmetric tokamak geometry is analyzed. The paradigm that the turbulence is saturated when the zonal (axisymmetic) ExB flow shearing rate competes with linear growth is shown to not apply to the electron scale turbulence. Instead, it is the mixing rate by the zonal ExB velocity spectrum with the turbulent distribution function that competes with linear growth. A model of this mechanism is shown to be able to capture the suppression of electron-scale turbulence by ion-scale turbulence and the thresholdmore » for the increase in electron scale turbulence when the ion-scale turbulence is reduced. The model computes the strength of the zonal flow velocity and the saturated potential spectrum from the linear growth rate spectrum. The model for the saturated electric potential spectrum is applied to a quasilinear transport model and shown to accurately reproduce the electron and ion energy fluxes of the non-linear gyrokinetic multi-scale simulations. Finally, the zonal flow mixing saturation model is also shown to reproduce the non-linear upshift in the critical temperature gradient caused by zonal flows in ionscale gyrokinetic simulations.« less
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES

PubMed Central

Zhu, Liping; Huang, Mian; Li, Runze

2012-01-01

This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

NASA Astrophysics Data System (ADS)

Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

2017-12-01

The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
Logistic models--an odd(s) kind of regression.

PubMed

Jupiter, Daniel C

2013-01-01

The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions

PubMed Central

Fernandes, Bruno J. T.; Roque, Alexandre

2018-01-01

Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366
Modeling rainfall-runoff process using soft computing techniques

NASA Astrophysics Data System (ADS)

Kisi, Ozgur; Shiri, Jalal; Tombul, Mustafa

2013-02-01

Rainfall-runoff process was modeled for a small catchment in Turkey, using 4 years (1987-1991) of measurements of independent variables of rainfall and runoff values. The models used in the study were Artificial Neural Networks (ANNs), Adaptive Neuro-Fuzzy Inference System (ANFIS) and Gene Expression Programming (GEP) which are Artificial Intelligence (AI) approaches. The applied models were trained and tested using various combinations of the independent variables. The goodness of fit for the model was evaluated in terms of the coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), coefficient of efficiency (CE) and scatter index (SI). A comparison was also made between these models and traditional Multi Linear Regression (MLR) model. The study provides evidence that GEP (with RMSE=17.82 l/s, MAE=6.61 l/s, CE=0.72 and R2=0.978) is capable of modeling rainfall-runoff process and is a viable alternative to other applied artificial intelligence and MLR time-series methods.
Non-Linear Approach in Kinesiology Should Be Preferred to the Linear--A Case of Basketball.

PubMed

Trninić, Marko; Jeličić, Mario; Papić, Vladan

2015-07-01

In kinesiology, medicine, biology and psychology, in which research focus is on dynamical self-organized systems, complex connections exist between variables. Non-linear nature of complex systems has been discussed and explained by the example of non-linear anthropometric predictors of performance in basketball. Previous studies interpreted relations between anthropometric features and measures of effectiveness in basketball by (a) using linear correlation models, and by (b) including all basketball athletes in the same sample of participants regardless of their playing position. In this paper the significance and character of linear and non-linear relations between simple anthropometric predictors (AP) and performance criteria consisting of situation-related measures of effectiveness (SE) in basketball were determined and evaluated. The sample of participants consisted of top-level junior basketball players divided in three groups according to their playing time (8 minutes and more per game) and playing position: guards (N = 42), forwards (N = 26) and centers (N = 40). Linear (general model) and non-linear (general model) regression models were calculated simultaneously and separately for each group. The conclusion is viable: non-linear regressions are frequently superior to linear correlations when interpreting actual association logic among research variables.
QSRR modeling for diverse drugs using different feature selection methods coupled with linear and nonlinear regressions.

PubMed

Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan

2012-12-01

A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.
Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.

PubMed

Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W

1992-03-01

The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.

Patterns of medicinal plant use: an examination of the Ecuadorian Shuar medicinal flora using contingency table and binomial analyses.

PubMed

Bennett, Bradley C; Husby, Chad E

2008-03-28

Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
Support vector regression-guided unravelling: antioxidant capacity and quantitative structure-activity relationship predict reduction and promotion effects of flavonoids on acrylamide formation

PubMed Central

Huang, Mengmeng; Wei, Yan; Wang, Jun; Zhang, Yu

2016-01-01

We used the support vector regression (SVR) approach to predict and unravel reduction/promotion effect of characteristic flavonoids on the acrylamide formation under a low-moisture Maillard reaction system. Results demonstrated the reduction/promotion effects by flavonoids at addition levels of 1–10000 μmol/L. The maximal inhibition rates (51.7%, 68.8% and 26.1%) and promote rates (57.7%, 178.8% and 27.5%) caused by flavones, flavonols and isoflavones were observed at addition levels of 100 μmol/L and 10000 μmol/L, respectively. The reduction/promotion effects were closely related to the change of trolox equivalent antioxidant capacity (ΔTEAC) and well predicted by triple ΔTEAC measurements via SVR models (R: 0.633–0.900). Flavonols exhibit stronger effects on the acrylamide formation than flavones and isoflavones as well as their O-glycosides derivatives, which may be attributed to the number and position of phenolic and 3-enolic hydroxyls. The reduction/promotion effects were well predicted by using optimized quantitative structure-activity relationship (QSAR) descriptors and SVR models (R: 0.926–0.994). Compared to artificial neural network and multi-linear regression models, SVR models exhibited better fitting performance for both TEAC-dependent and QSAR descriptor-dependent predicting work. These observations demonstrated that the SVR models are competent for predicting our understanding on the future use of natural antioxidants for decreasing the acrylamide formation. PMID:27586851
Support vector regression-guided unravelling: antioxidant capacity and quantitative structure-activity relationship predict reduction and promotion effects of flavonoids on acrylamide formation

NASA Astrophysics Data System (ADS)

Huang, Mengmeng; Wei, Yan; Wang, Jun; Zhang, Yu

2016-09-01

We used the support vector regression (SVR) approach to predict and unravel reduction/promotion effect of characteristic flavonoids on the acrylamide formation under a low-moisture Maillard reaction system. Results demonstrated the reduction/promotion effects by flavonoids at addition levels of 1-10000 μmol/L. The maximal inhibition rates (51.7%, 68.8% and 26.1%) and promote rates (57.7%, 178.8% and 27.5%) caused by flavones, flavonols and isoflavones were observed at addition levels of 100 μmol/L and 10000 μmol/L, respectively. The reduction/promotion effects were closely related to the change of trolox equivalent antioxidant capacity (ΔTEAC) and well predicted by triple ΔTEAC measurements via SVR models (R: 0.633-0.900). Flavonols exhibit stronger effects on the acrylamide formation than flavones and isoflavones as well as their O-glycosides derivatives, which may be attributed to the number and position of phenolic and 3-enolic hydroxyls. The reduction/promotion effects were well predicted by using optimized quantitative structure-activity relationship (QSAR) descriptors and SVR models (R: 0.926-0.994). Compared to artificial neural network and multi-linear regression models, SVR models exhibited better fitting performance for both TEAC-dependent and QSAR descriptor-dependent predicting work. These observations demonstrated that the SVR models are competent for predicting our understanding on the future use of natural antioxidants for decreasing the acrylamide formation.
Predicting the demand of physician workforce: an international model based on "crowd behaviors".

PubMed

Tsai, Tsuen-Chiuan; Eliasziw, Misha; Chen, Der-Fang

2012-03-26

Appropriateness of physician workforce greatly influences the quality of healthcare. When facing the crisis of physician shortages, the correction of manpower always takes an extended time period, and both the public and health personnel suffer. To calculate an appropriate number of Physician Density (PD) for a specific country, this study was designed to create a PD prediction model, based on health-related data from many countries. Twelve factors that could possibly impact physicians' demand were chosen, and data of these factors from 130 countries (by reviewing 195) were extracted. Multiple stepwise-linear regression was used to derive the PD prediction model, and a split-sample cross-validation procedure was performed to evaluate the generalizability of the results. Using data from 130 countries, with the consideration of the correlation between variables, and preventing multi-collinearity, seven out of the 12 predictor variables were selected for entry into the stepwise regression procedure. The final model was: PD = (5.014 - 0.128 × proportion under age 15 years + 0.034 × life expectancy)2, with R2 of 80.4%. Using the prediction equation, 70 countries had PDs with "negative discrepancy", while 58 had PDs with "positive discrepancy". This study provided a regression-based PD model to calculate a "norm" number of PD for a specific country. A large PD discrepancy in a country indicates the needs to examine physician's workloads and their well-being, the effectiveness/efficiency of medical care, the promotion of population health and the team resource management.
BN-FLEMOps pluvial - A probabilistic multi-variable loss estimation model for pluvial floods

NASA Astrophysics Data System (ADS)

Roezer, V.; Kreibich, H.; Schroeter, K.; Doss-Gollin, J.; Lall, U.; Merz, B.

2017-12-01

Pluvial flood events, such as in Copenhagen (Denmark) in 2011, Beijing (China) in 2012 or Houston (USA) in 2016, have caused severe losses to urban dwellings in recent years. These floods are caused by storm events with high rainfall rates well above the design levels of urban drainage systems, which lead to inundation of streets and buildings. A projected increase in frequency and intensity of heavy rainfall events in many areas and an ongoing urbanization may increase pluvial flood losses in the future. For an efficient risk assessment and adaptation to pluvial floods, a quantification of the flood risk is needed. Few loss models have been developed particularly for pluvial floods. These models usually use simple waterlevel- or rainfall-loss functions and come with very high uncertainties. To account for these uncertainties and improve the loss estimation, we present a probabilistic multi-variable loss estimation model for pluvial floods based on empirical data. The model was developed in a two-step process using a machine learning approach and a comprehensive database comprising 783 records of direct building and content damage of private households. The data was gathered through surveys after four different pluvial flood events in Germany between 2005 and 2014. In a first step, linear and non-linear machine learning algorithms, such as tree-based and penalized regression models were used to identify the most important loss influencing factors among a set of 55 candidate variables. These variables comprise hydrological and hydraulic aspects, early warning, precaution, building characteristics and the socio-economic status of the household. In a second step, the most important loss influencing variables were used to derive a probabilistic multi-variable pluvial flood loss estimation model based on Bayesian Networks. Two different networks were tested: a score-based network learned from the data and a network based on expert knowledge. Loss predictions are made through Bayesian inference using Markov chain Monte Carlo (MCMC) sampling. With the ability to cope with incomplete information and use expert knowledge, as well as inherently providing quantitative uncertainty information, it is shown that loss models based on BNs are superior to deterministic approaches for pluvial flood risk assessment.
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure

PubMed Central

Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith

2017-01-01

Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
Linking brain-wide multivoxel activation patterns to behaviour: Examples from language and math.

PubMed

Raizada, Rajeev D S; Tsao, Feng-Ming; Liu, Huei-Mei; Holloway, Ian D; Ansari, Daniel; Kuhl, Patricia K

2010-05-15

A key goal of cognitive neuroscience is to find simple and direct connections between brain and behaviour. However, fMRI analysis typically involves choices between many possible options, with each choice potentially biasing any brain-behaviour correlations that emerge. Standard methods of fMRI analysis assess each voxel individually, but then face the problem of selection bias when combining those voxels into a region-of-interest, or ROI. Multivariate pattern-based fMRI analysis methods use classifiers to analyse multiple voxels together, but can also introduce selection bias via data-reduction steps as feature selection of voxels, pre-selecting activated regions, or principal components analysis. We show here that strong brain-behaviour links can be revealed without any voxel selection or data reduction, using just plain linear regression as a classifier applied to the whole brain at once, i.e. treating each entire brain volume as a single multi-voxel pattern. The brain-behaviour correlations emerged despite the fact that the classifier was not provided with any information at all about subjects' behaviour, but instead was given only the neural data and its condition-labels. Surprisingly, more powerful classifiers such as a linear SVM and regularised logistic regression produce very similar results. We discuss some possible reasons why the very simple brain-wide linear regression model is able to find correlations with behaviour that are as strong as those obtained on the one hand from a specific ROI and on the other hand from more complex classifiers. In a manner which is unencumbered by arbitrary choices, our approach offers a method for investigating connections between brain and behaviour which is simple, rigorous and direct. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Linking brain-wide multivoxel activation patterns to behaviour: Examples from language and math

PubMed Central

Raizada, Rajeev D.S.; Tsao, Feng-Ming; Liu, Huei-Mei; Holloway, Ian D.; Ansari, Daniel; Kuhl, Patricia K.

2010-01-01

A key goal of cognitive neuroscience is to find simple and direct connections between brain and behaviour. However, fMRI analysis typically involves choices between many possible options, with each choice potentially biasing any brain–behaviour correlations that emerge. Standard methods of fMRI analysis assess each voxel individually, but then face the problem of selection bias when combining those voxels into a region-of-interest, or ROI. Multivariate pattern-based fMRI analysis methods use classifiers to analyse multiple voxels together, but can also introduce selection bias via data-reduction steps as feature selection of voxels, pre-selecting activated regions, or principal components analysis. We show here that strong brain–behaviour links can be revealed without any voxel selection or data reduction, using just plain linear regression as a classifier applied to the whole brain at once, i.e. treating each entire brain volume as a single multi-voxel pattern. The brain–behaviour correlations emerged despite the fact that the classifier was not provided with any information at all about subjects' behaviour, but instead was given only the neural data and its condition-labels. Surprisingly, more powerful classifiers such as a linear SVM and regularised logistic regression produce very similar results. We discuss some possible reasons why the very simple brain-wide linear regression model is able to find correlations with behaviour that are as strong as those obtained on the one hand from a specific ROI and on the other hand from more complex classifiers. In a manner which is unencumbered by arbitrary choices, our approach offers a method for investigating connections between brain and behaviour which is simple, rigorous and direct. PMID:20132896
Liver attenuation, pericardial adipose tissue, obesity, and insulin resistance: the Multi-Ethnic Study of Atherosclerosis (MESA).

PubMed

McAuley, Paul A; Hsu, Fang-Chi; Loman, Kurt K; Carr, J Jeffrey; Budoff, Matthew J; Szklo, Moyses; Sharrett, A Richey; Ding, Jingzhong

2011-09-01

Insulin resistance is linked to general and abdominal obesity, but its relation to hepatic lipid content and pericardial adipose tissue is less clear. The purpose of this study was to examine cross-sectional associations of liver attenuation, pericardial adipose tissue, BMI, and waist circumference with insulin resistance. We measured liver attenuation and pericardial adipose tissue using the existing cardiac computed tomography scans in 5,291 individuals free of clinical cardiovascular disease and diabetes in the Multi-Ethnic Study of Atherosclerosis (MESA) during the study's baseline visit (2000-2002). Low liver attenuation was defined as the lowest quartile and high pericardial adipose tissue as the upper quartile of volume (cm(3)). We used standard clinical definitions for obesity and abdominal obesity. Insulin resistance was assessed by the homeostasis model assessment of insulin resistance (HOMA(IR)) index. In multivariate linear regression with all adiposity measures in the model simultaneously, all adiposity measures were significantly (P < 0.0001) associated with insulin resistance: regression coefficients (±s.e.) were 0.31 (±0.02) for low liver attenuation, 0.27 (±0.02) for high pericardial adipose tissue, 0.27 (±0.02) for obesity, and 0.32 (±0.02) for abdominal obesity. We found significant differences (P = 0.003) between standardized liver attenuation and insulin resistance by ethnicity: regression coefficients per 1 s.d. increment were 0.10 ± 0.01 for whites, 0.11 ± 0.02 for Chinese, 0.08 ± 0.2 for blacks, and 0.14 ± 0.01 for Hispanics. Liver attenuation and pericardial adipose tissue were associated with insulin resistance, independent of BMI and waist circumference.
Linear regression analysis of survival data with missing censoring indicators.

PubMed

Wang, Qihua; Dinse, Gregg E

2011-04-01

Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.
Development of efficiency indicators of operating room management for multi-institutional comparisons.

PubMed

Tanaka, Masayuki; Lee, Jason; Ikai, Hiroshi; Imanaka, Yuichi

2013-04-01

The efficiency of a hospital's operating room (OR) management can affect its overall profitability. However, existing indicators that assess OR management efficiency do not take into account differences in hospital size, manpower and functional characteristics, thereby rendering them unsuitable for multi-institutional comparisons. The aim of this study was to develop indicators of OR management efficiency that would take into account differences in hospital size and manpower, which may then be applied to multi-institutional comparisons. Using administrative data from 224 hospitals in Japan from 2008 to 2010, we performed four multiple linear regression analyses at the hospital level, in which the dependent variables were the number of operations per OR per month, procedural fees per OR per month, total utilization times per OR per month and total fees per OR per month for each of the models. The expected values of these four indicators were produced using multiple regression analysis results, adjusting for differences in hospital size and manpower, which are beyond the control of process owners' management. However, more than half of the variations in three of these four indicators were shown to be explained by differences in hospital size and manpower. Using the ratio of observed to expected values (OE ratio), as well as the difference between the two values (OE difference) allows hospitals to identify weaknesses in efficiency with more validity when compared to unadjusted indicators. The new indicators may support the improvement and sustainment of a high-quality health care system. © 2012 Blackwell Publishing Ltd.
Analytical framework for reconstructing heterogeneous environmental variables from mammal community structure.

PubMed

Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C

2015-01-01

We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.
A New SEYHAN's Approach in Case of Heterogeneity of Regression Slopes in ANCOVA.

PubMed

Ankarali, Handan; Cangur, Sengul; Ankarali, Seyit

2018-06-01

In this study, when the assumptions of linearity and homogeneity of regression slopes of conventional ANCOVA are not met, a new approach named as SEYHAN has been suggested to use conventional ANCOVA instead of robust or nonlinear ANCOVA. The proposed SEYHAN's approach involves transformation of continuous covariate into categorical structure when the relationship between covariate and dependent variable is nonlinear and the regression slopes are not homogenous. A simulated data set was used to explain SEYHAN's approach. In this approach, we performed conventional ANCOVA in each subgroup which is constituted according to knot values and analysis of variance with two-factor model after MARS method was used for categorization of covariate. The first model is a simpler model than the second model that includes interaction term. Since the model with interaction effect has more subjects, the power of test also increases and the existing significant difference is revealed better. We can say that linearity and homogeneity of regression slopes are not problem for data analysis by conventional linear ANCOVA model by helping this approach. It can be used fast and efficiently for the presence of one or more covariates.
MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes.

PubMed

Kim, Sungjin; Jinich, Adrián; Aspuru-Guzik, Alán

2017-04-24

We propose a multiple descriptor multiple kernel (MultiDK) method for efficient molecular discovery using machine learning. We show that the MultiDK method improves both the speed and accuracy of molecular property prediction. We apply the method to the discovery of electrolyte molecules for aqueous redox flow batteries. Using multiple-type-as opposed to single-type-descriptors, we obtain more relevant features for machine learning. Following the principle of "wisdom of the crowds", the combination of multiple-type descriptors significantly boosts prediction performance. Moreover, by employing multiple kernels-more than one kernel function for a set of the input descriptors-MultiDK exploits nonlinear relations between molecular structure and properties better than a linear regression approach. The multiple kernels consist of a Tanimoto similarity kernel and a linear kernel for a set of binary descriptors and a set of nonbinary descriptors, respectively. Using MultiDK, we achieve an average performance of r 2 = 0.92 with a test set of molecules for solubility prediction. We also extend MultiDK to predict pH-dependent solubility and apply it to a set of quinone molecules with different ionizable functional groups to assess their performance as flow battery electrolytes.
Examining the influence of link function misspecification in conventional regression models for developing crash modification factors.

PubMed

Wu, Lingtao; Lord, Dominique

2017-05-01

This study further examined the use of regression models for developing crash modification factors (CMFs), specifically focusing on the misspecification in the link function. The primary objectives were to validate the accuracy of CMFs derived from the commonly used regression models (i.e., generalized linear models or GLMs with additive linear link functions) when some of the variables have nonlinear relationships and quantify the amount of bias as a function of the nonlinearity. Using the concept of artificial realistic data, various linear and nonlinear crash modification functions (CM-Functions) were assumed for three variables. Crash counts were randomly generated based on these CM-Functions. CMFs were then derived from regression models for three different scenarios. The results were compared with the assumed true values. The main findings are summarized as follows: (1) when some variables have nonlinear relationships with crash risk, the CMFs for these variables derived from the commonly used GLMs are all biased, especially around areas away from the baseline conditions (e.g., boundary areas); (2) with the increase in nonlinearity (i.e., nonlinear relationship becomes stronger), the bias becomes more significant; (3) the quality of CMFs for other variables having linear relationships can be influenced when mixed with those having nonlinear relationships, but the accuracy may still be acceptable; and (4) the misuse of the link function for one or more variables can also lead to biased estimates for other parameters. This study raised the importance of the link function when using regression models for developing CMFs. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cocaine Dependence Treatment Data: Methods for Measurement Error Problems With Predictors Derived From Stationary Stochastic Processes

PubMed Central

Guan, Yongtao; Li, Yehua; Sinha, Rajita

2011-01-01

In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
An empirical study of statistical properties of variance partition coefficients for multi-level logistic regression models

USGS Publications Warehouse

Li, Ji; Gray, B.R.; Bates, D.M.

2008-01-01

Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

PubMed

Ng, Kar Yong; Awang, Norhashidah

2018-01-06

Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.
Feedback linearizing control of a MIMO power system

NASA Astrophysics Data System (ADS)

Ilyes, Laszlo

Prior research has demonstrated that either the mechanical or electrical subsystem of a synchronous electric generator may be controlled using single-input single-output (SISO) nonlinear feedback linearization. This research suggests a new approach which applies nonlinear feedback linearization to a multi-input multi-output (MIMO) model of the synchronous electric generator connected to an infinite bus load model. In this way, the electrical and mechanical subsystems may be linearized and simultaneously decoupled through the introduction of a pair of auxiliary inputs. This allows well known, linear, SISO control methods to be effectively applied to the resulting systems. The derivation of the feedback linearizing control law is presented in detail, including a discussion on the use of symbolic math processing as a development tool. The linearizing and decoupling properties of the control law are validated through simulation. And finally, the robustness of the control law is demonstrated.
Prenatal Phthalate, Perfluoroalkyl Acid, and Organochlorine Exposures and Term Birth Weight in Three Birth Cohorts: Multi-Pollutant Models Based on Elastic Net Regression

PubMed Central

Lenters, Virissa; Portengen, Lützen; Rignell-Hydbom, Anna; Jönsson, Bo A.G.; Lindh, Christian H.; Piersma, Aldert H.; Toft, Gunnar; Bonde, Jens Peter; Heederik, Dick; Rylander, Lars; Vermeulen, Roel

2015-01-01

Background Some legacy and emerging environmental contaminants are suspected risk factors for intrauterine growth restriction. However, the evidence is equivocal, in part due to difficulties in disentangling the effects of mixtures. Objectives We assessed associations between multiple correlated biomarkers of environmental exposure and birth weight. Methods We evaluated a cohort of 1,250 term (≥ 37 weeks gestation) singleton infants, born to 513 mothers from Greenland, 180 from Poland, and 557 from Ukraine, who were recruited during antenatal care visits in 2002‒2004. Secondary metabolites of diethylhexyl and diisononyl phthalates (DEHP, DiNP), eight perfluoroalkyl acids, and organochlorines (PCB-153 and p,p´-DDE) were quantifiable in 72‒100% of maternal serum samples. We assessed associations between exposures and term birth weight, adjusting for co-exposures and covariates, including prepregnancy body mass index. To identify independent associations, we applied the elastic net penalty to linear regression models. Results Two phthalate metabolites (MEHHP, MOiNP), perfluorooctanoic acid (PFOA), and p,p´-DDE were most consistently predictive of term birth weight based on elastic net penalty regression. In an adjusted, unpenalized regression model of the four exposures, 2-SD increases in natural log–transformed MEHHP, PFOA, and p,p´-DDE were associated with lower birth weight: –87 g (95% CI: –137, –340 per 1.70 ng/mL), –43 g (95% CI: –108, 23 per 1.18 ng/mL), and –135 g (95% CI: –192, –78 per 1.82 ng/g lipid), respectively; and MOiNP was associated with higher birth weight (46 g; 95% CI: –5, 97 per 2.22 ng/mL). Conclusions This study suggests that several of the environmental contaminants, belonging to three chemical classes, may be independently associated with impaired fetal growth. These results warrant follow-up in other cohorts. Citation Lenters V, Portengen L, Rignell-Hydbom A, Jönsson BA, Lindh CH, Piersma AH, Toft G, Bonde JP, Heederik D, Rylander L, Vermeulen R. 2016. Prenatal phthalate, perfluoroalkyl acid, and organochlorine exposures and term birth weight in three birth cohorts: multi-pollutant models based on elastic net regression. Environ Health Perspect 124:365–372; http://dx.doi.org/10.1289/ehp.1408933 PMID:26115335

Assessing the role of feed water constituents in irreversible membrane fouling of pilot-scale ultrafiltration drinking water treatment systems.

PubMed

Peiris, R H; Jaklewicz, M; Budman, H; Legge, R L; Moresoli, C

2013-06-15

Fluorescence excitation-emission matrix (EEM) approach together with principal component analysis (PCA) was used for assessing hydraulically irreversible fouling of three pilot-scale ultrafiltration (UF) systems containing full-scale and bench-scale hollow fiber membrane modules in drinking water treatment. These systems were operated for at least three months with extensive cycles of permeation, combination of back-pulsing and scouring and chemical cleaning. The principal component (PC) scores generated from the PCA of the fluorescence EEMs were found to be related to humic substances (HS), protein-like and colloidal/particulate matter content. PC scores of HS- and protein-like matter of the UF feed water, when considered separately, showed reasonably good correlations with the rate of hydraulically irreversible fouling for long-term UF operations. In contrast, comparatively weaker correlations for PC scores of colloidal/particulate matter and the rate of hydraulically irreversible fouling were obtained for all UF systems. Since, individual correlations could not fully explain the evolution of the rate of irreversible fouling, multi-linear regression models were developed to relate the combined effect of HS-like, protein-like and colloidal/particulate matter PC scores to the rate of hydraulically irreversible fouling for each specific UF system. These multi-linear regression models revealed significant individual and combined contribution of HS- and protein-like matter to the rate of hydraulically irreversible fouling, with protein-like matter generally showing the greatest contribution. The contribution of colloidal/particulate matter to the rate of hydraulically irreversible fouling was not as significant. The addition of polyaluminum chloride, as coagulant, to UF feed appeared to have a positive impact in reducing hydraulically irreversible fouling by these constituents. The proposed approach has applications in quantifying the individual and synergistic contribution of major natural water constituents to the rate of hydraulically irreversible membrane fouling and shows potential for controlling UF irreversible fouling in the production of drinking water. Copyright © 2013 Elsevier Ltd. All rights reserved.
Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics

NASA Astrophysics Data System (ADS)

Iyer, V.; Shetty, S.; Iyengar, S. S.

2015-07-01

Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
Analysis of Binary Adherence Data in the Setting of Polypharmacy: A Comparison of Different Approaches

PubMed Central

Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.

2009-01-01

Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358
Three-Dimensional City Determinants of the Urban Heat Island: A Statistical Approach

NASA Astrophysics Data System (ADS)

Chun, Bum Seok

There is no doubt that the Urban Heat Island (UHI) is a mounting problem in built-up environments, due to the energy retention by the surface materials of dense buildings, leading to increased temperatures, air pollution, and energy consumption. Much of the earlier research on the UHI has used two-dimensional (2-D) information, such as land uses and the distribution of vegetation. In the case of homogeneous land uses, it is possible to predict surface temperatures with reasonable accuracy with 2-D information. However, three-dimensional (3-D) information is necessary to analyze more complex sites, including dense building clusters. Recent research on the UHI has started to consider multi-dimensional models. The purpose of this research is to explore the urban determinants of the UHI, using 2-D/3-D urban information with statistical modeling. The research includes the following stages: (a) estimating urban temperature, using satellite images, (b) developing a 3-D city model by LiDAR data, (c) generating geometric parameters with regard to 2-/3-D geospatial information, and (d) conducting different statistical analyses: OLS and spatial regressions. The research area is part of the City of Columbus, Ohio. To effectively and systematically analyze the UHI, hierarchical grid scales (480m, 240m, 120m, 60m, and 30m) are proposed, together with linear and the log-linear regression models. The non-linear OLS models with Log(AST) as dependent variable have the highest R2 among all the OLS-estimated models. However, both SAR and GSM models are estimated for the 480m, 240m, 120m, and 60m grids to reduce their spatial dependency. Most GSM models have R2s higher than 0.9, except for the 240m grid. Overall, the urban characteristics having high impacts in all grids are embodied in solar radiation, 3-D open space, greenery, and water streams. These results demonstrate that it is possible to mitigate the UHI, providing guidelines for policies aiming to reduce the UHI.
Quantification of endocrine disruptors and pesticides in water by gas chromatography-tandem mass spectrometry. Method validation using weighted linear regression schemes.

PubMed

Mansilha, C; Melo, A; Rebelo, H; Ferreira, I M P L V O; Pinho, O; Domingues, V; Pinho, C; Gameiro, P

2010-10-22

A multi-residue methodology based on a solid phase extraction followed by gas chromatography-tandem mass spectrometry was developed for trace analysis of 32 compounds in water matrices, including estrogens and several pesticides from different chemical families, some of them with endocrine disrupting properties. Matrix standard calibration solutions were prepared by adding known amounts of the analytes to a residue-free sample to compensate matrix-induced chromatographic response enhancement observed for certain pesticides. Validation was done mainly according to the International Conference on Harmonisation recommendations, as well as some European and American validation guidelines with specifications for pesticides analysis and/or GC-MS methodology. As the assumption of homoscedasticity was not met for analytical data, weighted least squares linear regression procedure was applied as a simple and effective way to counteract the greater influence of the greater concentrations on the fitted regression line, improving accuracy at the lower end of the calibration curve. The method was considered validated for 31 compounds after consistent evaluation of the key analytical parameters: specificity, linearity, limit of detection and quantification, range, precision, accuracy, extraction efficiency, stability and robustness. Copyright © 2010 Elsevier B.V. All rights reserved.
Quantification of trace metals in infant formula premixes using laser-induced breakdown spectroscopy

NASA Astrophysics Data System (ADS)

Cama-Moncunill, Raquel; Casado-Gavalda, Maria P.; Cama-Moncunill, Xavier; Markiewicz-Keszycka, Maria; Dixit, Yash; Cullen, Patrick J.; Sullivan, Carl

2017-09-01

Infant formula is a human milk substitute generally based upon fortified cow milk components. In order to mimic the composition of breast milk, trace elements such as copper, iron and zinc are usually added in a single operation using a premix. The correct addition of premixes must be verified to ensure that the target levels in infant formulae are achieved. In this study, a laser-induced breakdown spectroscopy (LIBS) system was assessed as a fast validation tool for trace element premixes. LIBS is a promising emission spectroscopic technique for elemental analysis, which offers real-time analyses, little to no sample preparation and ease of use. LIBS was employed for copper and iron determinations of premix samples ranging approximately from 0 to 120 mg/kg Cu/1640 mg/kg Fe. LIBS spectra are affected by several parameters, hindering subsequent quantitative analyses. This work aimed at testing three matrix-matched calibration approaches (simple-linear regression, multi-linear regression and partial least squares regression (PLS)) as means for precision and accuracy enhancement of LIBS quantitative analysis. All calibration models were first developed using a training set and then validated with an independent test set. PLS yielded the best results. For instance, the PLS model for copper provided a coefficient of determination (R2) of 0.995 and a root mean square error of prediction (RMSEP) of 14 mg/kg. Furthermore, LIBS was employed to penetrate through the samples by repetitively measuring the same spot. Consequently, LIBS spectra can be obtained as a function of sample layers. This information was used to explore whether measuring deeper into the sample could reduce possible surface-contaminant effects and provide better quantifications.
Can multi-slice or navigator-gated R2* MRI replace single-slice breath-hold acquisition for hepatic iron quantification?

PubMed

Loeffler, Ralf B; McCarville, M Beth; Wagstaff, Anne W; Smeltzer, Matthew P; Krafft, Axel J; Song, Ruitian; Hankins, Jane S; Hillenbrand, Claudia M

2017-01-01

Liver R2* values calculated from multi-gradient echo (mGRE) magnetic resonance images (MRI) are strongly correlated with hepatic iron concentration (HIC) as shown in several independently derived biopsy calibration studies. These calibrations were established for axial single-slice breath-hold imaging at the location of the portal vein. Scanning in multi-slice mode makes the exam more efficient, since whole-liver coverage can be achieved with two breath-holds and the optimal slice can be selected afterward. Navigator echoes remove the need for breath-holds and allow use in sedated patients. To evaluate if the existing biopsy calibrations can be applied to multi-slice and navigator-controlled mGRE imaging in children with hepatic iron overload, by testing if there is a bias-free correlation between single-slice R2* and multi-slice or multi-slice navigator controlled R2*. This study included MRI data from 71 patients with transfusional iron overload, who received an MRI exam to estimate HIC using gradient echo sequences. Patient scans contained 2 or 3 of the following imaging methods used for analysis: single-slice images (n = 71), multi-slice images (n = 69) and navigator-controlled images (n = 17). Small and large blood corrected region of interests were selected on axial images of the liver to obtain R2* values for all data sets. Bland-Altman and linear regression analysis were used to compare R2* values from single-slice images to those of multi-slice images and navigator-controlled images. Bland-Altman analysis showed that all imaging method comparisons were strongly associated with each other and had high correlation coefficients (0.98 ≤ r ≤ 1.00) with P-values ≤0.0001. Linear regression yielded slopes that were close to 1. We found that navigator-gated or breath-held multi-slice R2* MRI for HIC determination measures R2* values comparable to the biopsy-validated single-slice, single breath-hold scan. We conclude that these three R2* methods can be interchangeably used in existing R2*-HIC calibrations.
Experimental and computational prediction of glass transition temperature of drugs.

PubMed

Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S

2014-12-22

Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
Development of non-linear models predicting daily fine particle concentrations using aerosol optical depth retrievals and ground-based measurements at a municipality in the Brazilian Amazon region

NASA Astrophysics Data System (ADS)

Gonçalves, Karen dos Santos; Winkler, Mirko S.; Benchimol-Barbosa, Paulo Roberto; de Hoogh, Kees; Artaxo, Paulo Eduardo; de Souza Hacon, Sandra; Schindler, Christian; Künzli, Nino

2018-07-01

Epidemiological studies generally use particulate matter measurements with diameter less 2.5 μm (PM2.5) from monitoring networks. Satellite aerosol optical depth (AOD) data has considerable potential in predicting PM2.5 concentrations, and thus provides an alternative method for producing knowledge regarding the level of pollution and its health impact in areas where no ground PM2.5 measurements are available. This is the case in the Brazilian Amazon rainforest region where forest fires are frequent sources of high pollution. In this study, we applied a non-linear model for predicting PM2.5 concentration from AOD retrievals using interaction terms between average temperature, relative humidity, sine, cosine of date in a period of 365,25 days and the square of the lagged relative residual. Regression performance statistics were tested comparing the goodness of fit and R2 based on results from linear regression and non-linear regression for six different models. The regression results for non-linear prediction showed the best performance, explaining on average 82% of the daily PM2.5 concentrations when considering the whole period studied. In the context of Amazonia, it was the first study predicting PM2.5 concentrations using the latest high-resolution AOD products also in combination with the testing of a non-linear model performance. Our results permitted a reliable prediction considering the AOD-PM2.5 relationship and set the basis for further investigations on air pollution impacts in the complex context of Brazilian Amazon Region.
Time-resolved perfusion imaging at the angiography suite: preclinical comparison of a new flat-detector application to computed tomography perfusion.

PubMed

Jürgens, Julian H W; Schulz, Nadine; Wybranski, Christian; Seidensticker, Max; Streit, Sebastian; Brauner, Jan; Wohlgemuth, Walter A; Deuerling-Zheng, Yu; Ricke, Jens; Dudeck, Oliver

2015-02-01

The objective of this study was to compare the parameter maps of a new flat-panel detector application for time-resolved perfusion imaging in the angiography room (FD-CTP) with computed tomography perfusion (CTP) in an experimental tumor model. Twenty-four VX2 tumors were implanted into the hind legs of 12 rabbits. Three weeks later, FD-CTP (Artis zeego; Siemens) and CTP (SOMATOM Definition AS +; Siemens) were performed. The parameter maps for the FD-CTP were calculated using a prototype software, and those for the CTP were calculated with VPCT-body software on a dedicated syngo MultiModality Workplace. The parameters were compared using Pearson product-moment correlation coefficient and linear regression analysis. The Pearson product-moment correlation coefficient showed good correlation values for both the intratumoral blood volume of 0.848 (P < 0.01) and the blood flow of 0.698 (P < 0.01). The linear regression analysis of the perfusion between FD-CTP and CTP showed for the blood volume a regression equation y = 4.44x + 36.72 (P < 0.01) and for the blood flow y = 0.75x + 14.61 (P < 0.01). This preclinical study provides evidence that FD-CTP allows a time-resolved (dynamic) perfusion imaging of tumors similar to CTP, which provides the basis for clinical applications such as the assessment of tumor response to locoregional therapies directly in the angiography suite.
Precision Efficacy Analysis for Regression.

ERIC Educational Resources Information Center

Brooks, Gordon P.

When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harlim, John, E-mail: jharlim@psu.edu; Mahdi, Adam, E-mail: amahdi@ncsu.edu; Majda, Andrew J., E-mail: jonjon@cims.nyu.edu

2014-01-15

A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partialmore » noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.« less
Application of linear regression analysis in accuracy assessment of rolling force calculations

NASA Astrophysics Data System (ADS)

Poliak, E. I.; Shim, M. K.; Kim, G. S.; Choo, W. Y.

1998-10-01

Efficient operation of the computational models employed in process control systems require periodical assessment of the accuracy of their predictions. Linear regression is proposed as a tool which allows separate systematic and random prediction errors from those related to measurements. A quantitative characteristic of the model predictive ability is introduced in addition to standard statistical tests for model adequacy. Rolling force calculations are considered as an example for the application. However, the outlined approach can be used to assess the performance of any computational model.
Nonlinear aeroservoelastic analysis of a controlled multiple-actuated-wing model with free-play

NASA Astrophysics Data System (ADS)

Huang, Rui; Hu, Haiyan; Zhao, Yonghui

2013-10-01

In this paper, the effects of structural nonlinearity due to free-play in both leading-edge and trailing-edge outboard control surfaces on the linear flutter control system are analyzed for an aeroelastic model of three-dimensional multiple-actuated-wing. The free-play nonlinearities in the control surfaces are modeled theoretically by using the fictitious mass approach. The nonlinear aeroelastic equations of the presented model can be divided into nine sub-linear modal-based aeroelastic equations according to the different combinations of deflections of the leading-edge and trailing-edge outboard control surfaces. The nonlinear aeroelastic responses can be computed based on these sub-linear aeroelastic systems. To demonstrate the effects of nonlinearity on the linear flutter control system, a single-input and single-output controller and a multi-input and multi-output controller are designed based on the unconstrained optimization techniques. The numerical results indicate that the free-play nonlinearity can lead to either limit cycle oscillations or divergent motions when the linear control system is implemented.
Mathematical model for the contribution of individual organs to non-zero y-intercepts in single and multi-compartment linear models of whole-body energy expenditure.

PubMed

Kaiyala, Karl J

2014-01-01

Mathematical models for the dependence of energy expenditure (EE) on body mass and composition are essential tools in metabolic phenotyping. EE scales over broad ranges of body mass as a non-linear allometric function. When considered within restricted ranges of body mass, however, allometric EE curves exhibit 'local linearity.' Indeed, modern EE analysis makes extensive use of linear models. Such models typically involve one or two body mass compartments (e.g., fat free mass and fat mass). Importantly, linear EE models typically involve a non-zero (usually positive) y-intercept term of uncertain origin, a recurring theme in discussions of EE analysis and a source of confounding in traditional ratio-based EE normalization. Emerging linear model approaches quantify whole-body resting EE (REE) in terms of individual organ masses (e.g., liver, kidneys, heart, brain). Proponents of individual organ REE modeling hypothesize that multi-organ linear models may eliminate non-zero y-intercepts. This could have advantages in adjusting REE for body mass and composition. Studies reveal that individual organ REE is an allometric function of total body mass. I exploit first-order Taylor linearization of individual organ REEs to model the manner in which individual organs contribute to whole-body REE and to the non-zero y-intercept in linear REE models. The model predicts that REE analysis at the individual organ-tissue level will not eliminate intercept terms. I demonstrate that the parameters of a linear EE equation can be transformed into the parameters of the underlying 'latent' allometric equation. This permits estimates of the allometric scaling of EE in a diverse variety of physiological states that are not represented in the allometric EE literature but are well represented by published linear EE analyses.
Modelling hourly dissolved oxygen concentration (DO) using dynamic evolving neural-fuzzy inference system (DENFIS)-based approach: case study of Klamath River at Miller Island Boat Ramp, OR, USA.

PubMed

Heddam, Salim

2014-01-01

In this study, we present application of an artificial intelligence (AI) technique model called dynamic evolving neural-fuzzy inference system (DENFIS) based on an evolving clustering method (ECM), for modelling dissolved oxygen concentration in a river. To demonstrate the forecasting capability of DENFIS, a one year period from 1 January 2009 to 30 December 2009, of hourly experimental water quality data collected by the United States Geological Survey (USGS Station No: 420853121505500) station at Klamath River at Miller Island Boat Ramp, OR, USA, were used for model development. Two DENFIS-based models are presented and compared. The two DENFIS systems are: (1) offline-based system named DENFIS-OF, and (2) online-based system, named DENFIS-ON. The input variables used for the two models are water pH, temperature, specific conductance, and sensor depth. The performances of the models are evaluated using root mean square errors (RMSE), mean absolute error (MAE), Willmott index of agreement (d) and correlation coefficient (CC) statistics. The lowest root mean square error and highest correlation coefficient values were obtained with the DENFIS-ON method. The results obtained with DENFIS models are compared with linear (multiple linear regression, MLR) and nonlinear (multi-layer perceptron neural networks, MLPNN) methods. This study demonstrates that DENFIS-ON investigated herein outperforms all the proposed techniques for DO modelling.
Evaluation of globally available precipitation data products as input for water balance models

NASA Astrophysics Data System (ADS)

Lebrenz, H.; Bárdossy, A.

2009-04-01

Subject of this study is the evaluation of globally available precipitation data products, which are intended to be used as input variables for water balance models in ungauged basins. The selected data sources are a) the Global Precipitation Climatology Centre (GPCC), b) the Global Precipitation Climatology Project (GPCP) and c) the Climate Research Unit (CRU), resulting into twelve globally available data products. The data products imply different data bases, different derivation routines and varying resolutions in time and space. For validation purposes, the ground data from South Africa were screened on homogeneity and consistency by various tests and an outlier detection using multi-linear regression was performed. External Drift Kriging was subsequently applied on the ground data and the resulting precipitation arrays were compared to the different products with respect to quantity and variance.
Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

NASA Astrophysics Data System (ADS)

Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

2017-12-01

An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.

PubMed

Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C

2014-03-01

In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
Spatio-temporal water quality mapping from satellite images using geographically and temporally weighted regression

NASA Astrophysics Data System (ADS)

Chu, Hone-Jay; Kong, Shish-Jeng; Chang, Chih-Hua

2018-03-01

The turbidity (TB) of a water body varies with time and space. Water quality is traditionally estimated via linear regression based on satellite images. However, estimating and mapping water quality require a spatio-temporal nonstationary model, while TB mapping necessitates the use of geographically and temporally weighted regression (GTWR) and geographically weighted regression (GWR) models, both of which are more precise than linear regression. Given the temporal nonstationary models for mapping water quality, GTWR offers the best option for estimating regional water quality. Compared with GWR, GTWR provides highly reliable information for water quality mapping, boasts a relatively high goodness of fit, improves the explanation of variance from 44% to 87%, and shows a sufficient space-time explanatory power. The seasonal patterns of TB and the main spatial patterns of TB variability can be identified using the estimated TB maps from GTWR and by conducting an empirical orthogonal function (EOF) analysis.

Quality of life in breast cancer patients--a quantile regression analysis.

PubMed

Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma

2008-01-01

Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.
A computer tool for a minimax criterion in binary response and heteroscedastic simple linear regression models.

PubMed

Casero-Alonso, V; López-Fidalgo, J; Torsney, B

2017-01-01

Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright Â© 2016 Elsevier Ireland Ltd. All rights reserved.
Penalized spline estimation for functional coefficient regression models.

PubMed

Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan

2010-04-01

The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.
Product unit neural network models for predicting the growth limits of Listeria monocytogenes.

PubMed

Valero, A; Hervás, C; García-Gimeno, R M; Zurera, G

2007-08-01

A new approach to predict the growth/no growth interface of Listeria monocytogenes as a function of storage temperature, pH, citric acid (CA) and ascorbic acid (AA) is presented. A linear logistic regression procedure was performed and a non-linear model was obtained by adding new variables by means of a Neural Network model based on Product Units (PUNN). The classification efficiency of the training data set and the generalization data of the new Logistic Regression PUNN model (LRPU) were compared with Linear Logistic Regression (LLR) and Polynomial Logistic Regression (PLR) models. 92% of the total cases from the LRPU model were correctly classified, an improvement on the percentage obtained using the PLR model (90%) and significantly higher than the results obtained with the LLR model, 80%. On the other hand predictions of LRPU were closer to data observed which permits to design proper formulations in minimally processed foods. This novel methodology can be applied to predictive microbiology for describing growth/no growth interface of food-borne microorganisms such as L. monocytogenes. The optimal balance is trying to find models with an acceptable interpretation capacity and with good ability to fit the data on the boundaries of variable range. The results obtained conclude that these kinds of models might well be very a valuable tool for mathematical modeling.
Non-Linear Relationship between Economic Growth and CO2 Emissions in China: An Empirical Study Based on Panel Smooth Transition Regression Models

PubMed Central

Wang, Zheng-Xin; Hao, Peng; Yao, Pei-Yi

2017-01-01

The non-linear relationship between provincial economic growth and carbon emissions is investigated by using panel smooth transition regression (PSTR) models. The research indicates that, on the condition of separately taking Gross Domestic Product per capita (GDPpc), energy structure (Es), and urbanisation level (Ul) as transition variables, three models all reject the null hypothesis of a linear relationship, i.e., a non-linear relationship exists. The results show that the three models all contain only one transition function but different numbers of location parameters. The model taking GDPpc as the transition variable has two location parameters, while the other two models separately considering Es and Ul as the transition variables both contain one location parameter. The three models applied in the study all favourably describe the non-linear relationship between economic growth and CO2 emissions in China. It also can be seen that the conversion rate of the influence of Ul on per capita CO2 emissions is significantly higher than those of GDPpc and Es on per capita CO2 emissions. PMID:29236083
Non-Linear Relationship between Economic Growth and CO₂ Emissions in China: An Empirical Study Based on Panel Smooth Transition Regression Models.

PubMed

Wang, Zheng-Xin; Hao, Peng; Yao, Pei-Yi

2017-12-13

The non-linear relationship between provincial economic growth and carbon emissions is investigated by using panel smooth transition regression (PSTR) models. The research indicates that, on the condition of separately taking Gross Domestic Product per capita (GDPpc), energy structure (Es), and urbanisation level (Ul) as transition variables, three models all reject the null hypothesis of a linear relationship, i.e., a non-linear relationship exists. The results show that the three models all contain only one transition function but different numbers of location parameters. The model taking GDPpc as the transition variable has two location parameters, while the other two models separately considering Es and Ul as the transition variables both contain one location parameter. The three models applied in the study all favourably describe the non-linear relationship between economic growth and CO₂ emissions in China. It also can be seen that the conversion rate of the influence of Ul on per capita CO₂ emissions is significantly higher than those of GDPpc and Es on per capita CO₂ emissions.
Linear regression based on Minimum Covariance Determinant (MCD) and TELBS methods on the productivity of phytoplankton

NASA Astrophysics Data System (ADS)

Gusriani, N.; Firdaniza

2018-03-01

The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
BFLCRM: A BAYESIAN FUNCTIONAL LINEAR COX REGRESSION MODEL FOR PREDICTING TIME TO CONVERSION TO ALZHEIMER’S DISEASE*

PubMed Central

Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G

2015-01-01

The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer’s disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM. PMID:26900412
Ridge Regression for Interactive Models.

ERIC Educational Resources Information Center

Tate, Richard L.

1988-01-01

An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…
Reasons for Hierarchical Linear Modeling: A Reminder.

ERIC Educational Resources Information Center

Wang, Jianjun

1999-01-01

Uses examples of hierarchical linear modeling (HLM) at local and national levels to illustrate proper applications of HLM and dummy variable regression. Raises cautions about the circumstances under which hierarchical data do not need HLM. (SLD)
Image interpolation via regularized local linear regression.

PubMed

Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang

2011-12-01

The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE
Application of glas laser altimetry to detect elevation changes in East Antarctica

NASA Astrophysics Data System (ADS)

Scaioni, M.; Tong, X.; Li, R.

2013-10-01

In this paper the use of ICESat/GLAS laser altimeter for estimating multi-temporal elevation changes on polar ice sheets is afforded. Due to non-overlapping laser spots during repeat passes, interpolation methods are required to make comparisons. After reviewing the main methods described in the literature (crossover point analysis, cross-track DEM projection, space-temporal regressions), the last one has been chosen for its capability of providing more elevation change rate measurements. The standard implementation of the space-temporal linear regression technique has been revisited and improved to better cope with outliers and to check the estimability of model's parameters. GLAS data over the PANDA route in East Antarctica have been used for testing. Obtained results have been quite meaningful from a physical point of view, confirming the trend reported by the literature of a constant snow accumulation in the area during the two past decades, unlike the most part of the continent that has been losing mass.
A hybrid constructed wetland for organic-material and nutrient removal from sewage: Process performance and multi-kinetic models.

PubMed

Nguyen, X Cuong; Chang, S Woong; Nguyen, Thi Loan; Ngo, H Hao; Kumar, Gopalakrishnan; Banu, J Rajesh; Vu, M Cuong; Le, H Sinh; Nguyen, D Duc

2018-09-15

A pilot-scale hybrid constructed wetland with vertical flow and horizontal flow in series was constructed and used to investigate organic material and nutrient removal rate constants for wastewater treatment and establish a practical predictive model for use. For this purpose, the performance of multiple parameters was statistically evaluated during the process and predictive models were suggested. The measurement of the kinetic rate constant was based on the use of the first-order derivation and Monod kinetic derivation (Monod) paired with a plug flow reactor (PFR) and a continuously stirred tank reactor (CSTR). Both the Lindeman, Merenda, and Gold (LMG) analysis and Bayesian model averaging (BMA) method were employed for identifying the relative importance of variables and their optimal multiple regression (MR). The results showed that the first-order-PFR (M 2 ) model did not fit the data (P > 0.05, and R 2  < 0.5), whereas the first-order-CSTR (M 1 ) model for the chemical oxygen demand (COD Cr ) and Monod-CSTR (M 3 ) model for the COD Cr and ammonium nitrogen (NH 4 -N) showed a high correlation with the experimental data (R 2  > 0.5). The pollutant removal rates in the case of M 1 were 0.19 m/d (COD Cr ) and those for M 3 were 25.2 g/m 2 ∙d for COD Cr and 2.63 g/m 2 ∙d for NH 4 -N. By applying a multi-variable linear regression method, the optimal empirical models were established for predicting the final effluent concentration of five days' biochemical oxygen demand (BOD 5 ) and NH 4 -N. In general, the hydraulic loading rate was considered an important variable having a high value of relative importance, which appeared in all the optimal predictive models. Copyright © 2018 Elsevier Ltd. All rights reserved.
How Robust Is Linear Regression with Dummy Variables?

ERIC Educational Resources Information Center

Blankmeyer, Eric

2006-01-01

Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…
Estimating effects of limiting factors with regression quantiles

USGS Publications Warehouse

Cade, B.S.; Terrell, J.W.; Schroeder, R.L.

1999-01-01

In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e.g., 90-95th) were greater than if effects were estimated by changes in the means from standard linear model procedures. Estimating a range of regression quantiles (e.g., 5-95th) provides a comprehensive description of biological response patterns for exploratory and inferential analyses in observational studies of limiting factors, especially when sampling large spatial and temporal scales.
INNOVATIVE INSTRUMENTATION AND ANALYSIS OF THE TEMPERATURE MEASUREMENT FOR HIGH TEMPERATURE GASIFICATION

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seong W. Lee

During this reporting period, the literature survey including the gasifier temperature measurement literature, the ultrasonic application and its background study in cleaning application, and spray coating process are completed. The gasifier simulator (cold model) testing has been successfully conducted. Four factors (blower voltage, ultrasonic application, injection time intervals, particle weight) were considered as significant factors that affect the temperature measurement. The Analysis of Variance (ANOVA) was applied to analyze the test data. The analysis shows that all four factors are significant to the temperature measurements in the gasifier simulator (cold model). The regression analysis for the case with the normalizedmore » room temperature shows that linear model fits the temperature data with 82% accuracy (18% error). The regression analysis for the case without the normalized room temperature shows 72.5% accuracy (27.5% error). The nonlinear regression analysis indicates a better fit than that of the linear regression. The nonlinear regression model's accuracy is 88.7% (11.3% error) for normalized room temperature case, which is better than the linear regression analysis. The hot model thermocouple sleeve design and fabrication are completed. The gasifier simulator (hot model) design and the fabrication are completed. The system tests of the gasifier simulator (hot model) have been conducted and some modifications have been made. Based on the system tests and results analysis, the gasifier simulator (hot model) has met the proposed design requirement and the ready for system test. The ultrasonic cleaning method is under evaluation and will be further studied for the gasifier simulator (hot model) application. The progress of this project has been on schedule.« less
On the use and misuse of scalar scores of confounders in design and analysis of observational studies.

PubMed

Pfeiffer, R M; Riedl, R

2015-08-15

We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

PubMed

Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

2009-11-01

G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
[Application of SAS macro to evaluated multiplicative and additive interaction in logistic and Cox regression in clinical practices].

PubMed

Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q

2016-05-01

Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Protein linear indices of the 'macromolecular pseudograph alpha-carbon atom adjacency matrix' in bioinformatics. Part 1: prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor.

PubMed

Marrero-Ponce, Yovani; Medina-Marrero, Ricardo; Castillo-Garit, Juan A; Romero-Zaldivar, Vicente; Torrens, Francisco; Castro, Eduardo A

2005-04-15

A novel approach to bio-macromolecular design from a linear algebra point of view is introduced. A protein's total (whole protein) and local (one or more amino acid) linear indices are a new set of bio-macromolecular descriptors of relevance to protein QSAR/QSPR studies. These amino-acid level biochemical descriptors are based on the calculation of linear maps on Rn[f k(xmi):Rn-->Rn] in canonical basis. These bio-macromolecular indices are calculated from the kth power of the macromolecular pseudograph alpha-carbon atom adjacency matrix. Total linear indices are linear functional on Rn. That is, the kth total linear indices are linear maps from Rn to the scalar R[f k(xm):Rn-->R]. Thus, the kth total linear indices are calculated by summing the amino-acid linear indices of all amino acids in the protein molecule. A study of the protein stability effects for a complete set of alanine substitutions in the Arc repressor illustrates this approach. A quantitative model that discriminates near wild-type stability alanine mutants from the reduced-stability ones in a training series was obtained. This model permitted the correct classification of 97.56% (40/41) and 91.67% (11/12) of proteins in the training and test set, respectively. It shows a high Matthews correlation coefficient (MCC=0.952) for the training set and an MCC=0.837 for the external prediction set. Additionally, canonical regression analysis corroborated the statistical quality of the classification model (Rcanc=0.824). This analysis was also used to compute biological stability canonical scores for each Arc alanine mutant. On the other hand, the linear piecewise regression model compared favorably with respect to the linear regression one on predicting the melting temperature (tm) of the Arc alanine mutants. The linear model explains almost 81% of the variance of the experimental tm (R=0.90 and s=4.29) and the LOO press statistics evidenced its predictive ability (q2=0.72 and scv=4.79). Moreover, the TOMOCOMD-CAMPS method produced a linear piecewise regression (R=0.97) between protein backbone descriptors and tm values for alanine mutants of the Arc repressor. A break-point value of 51.87 degrees C characterized two mutant clusters and coincided perfectly with the experimental scale. For this reason, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutant Arc homodimers. These models also permitted the interpretation of the driving forces of such folding process, indicating that topologic/topographic protein backbone interactions control the stability profile of wild-type Arc and its alanine mutants.

Some comparisons of complexity in dictionary-based and linear computational models.

PubMed

Gnecco, Giorgio; Kůrková, Věra; Sanguineti, Marcello

2011-03-01

Neural networks provide a more flexible approximation of functions than traditional linear regression. In the latter, one can only adjust the coefficients in linear combinations of fixed sets of functions, such as orthogonal polynomials or Hermite functions, while for neural networks, one may also adjust the parameters of the functions which are being combined. However, some useful properties of linear approximators (such as uniqueness, homogeneity, and continuity of best approximation operators) are not satisfied by neural networks. Moreover, optimization of parameters in neural networks becomes more difficult than in linear regression. Experimental results suggest that these drawbacks of neural networks are offset by substantially lower model complexity, allowing accuracy of approximation even in high-dimensional cases. We give some theoretical results comparing requirements on model complexity for two types of approximators, the traditional linear ones and so called variable-basis types, which include neural networks, radial, and kernel models. We compare upper bounds on worst-case errors in variable-basis approximation with lower bounds on such errors for any linear approximator. Using methods from nonlinear approximation and integral representations tailored to computational units, we describe some cases where neural networks outperform any linear approximator. Copyright © 2010 Elsevier Ltd. All rights reserved.
Distributed collaborative probabilistic design of multi-failure structure with fluid-structure interaction using fuzzy neural network of regression

NASA Astrophysics Data System (ADS)

Song, Lu-Kai; Wen, Jie; Fei, Cheng-Wei; Bai, Guang-Chen

2018-05-01

To improve the computing efficiency and precision of probabilistic design for multi-failure structure, a distributed collaborative probabilistic design method-based fuzzy neural network of regression (FR) (called as DCFRM) is proposed with the integration of distributed collaborative response surface method and fuzzy neural network regression model. The mathematical model of DCFRM is established and the probabilistic design idea with DCFRM is introduced. The probabilistic analysis of turbine blisk involving multi-failure modes (deformation failure, stress failure and strain failure) was investigated by considering fluid-structure interaction with the proposed method. The distribution characteristics, reliability degree, and sensitivity degree of each failure mode and overall failure mode on turbine blisk are obtained, which provides a useful reference for improving the performance and reliability of aeroengine. Through the comparison of methods shows that the DCFRM reshapes the probability of probabilistic analysis for multi-failure structure and improves the computing efficiency while keeping acceptable computational precision. Moreover, the proposed method offers a useful insight for reliability-based design optimization of multi-failure structure and thereby also enriches the theory and method of mechanical reliability design.
Limits of detection and decision. Part 3

NASA Astrophysics Data System (ADS)

Voigtman, E.

2008-02-01

It has been shown that the MARLAP (Multi-Agency Radiological Laboratory Analytical Protocols) for estimating the Currie detection limit, which is based on 'critical values of the non-centrality parameter of the non-central t distribution', is intrinsically biased, even if no calibration curve or regression is used. This completed the refutation of the method, begun in Part 2. With the field cleared of obstructions, the true theory underlying Currie's limits of decision, detection and quantification, as they apply in a simple linear chemical measurement system (CMS) having heteroscedastic, Gaussian measurement noise and using weighted least squares (WLS) processing, was then derived. Extensive Monte Carlo simulations were performed, on 900 million independent calibration curves, for linear, "hockey stick" and quadratic noise precision models (NPMs). With errorless NPM parameters, all the simulation results were found to be in excellent agreement with the derived theoretical expressions. Even with as much as 30% noise on all of the relevant NPM parameters, the worst absolute errors in rates of false positives and false negatives, was only 0.3%.
Assessment of Various Remote Sensing Technologies in Biomass and Nitrogen Content Estimation Using AN Agricultural Test Field

NASA Astrophysics Data System (ADS)

Näsi, R.; Viljanen, N.; Kaivosoja, J.; Hakala, T.; Pandžić, M.; Markelin, L.; Honkavaara, E.

2017-10-01

Multispectral and hyperspectral imaging is usually acquired by satellite and aircraft platforms. Recently, miniaturized hyperspectral 2D frame cameras have showed great potential to precise agriculture estimations and they are feasible to combine with lightweight platforms, such as drones. Drone platform is a flexible tool for remote sensing applications with environment and agriculture. The assessment and comparison of different platforms such as satellite, aircraft and drones with different sensors, such as hyperspectral and RGB cameras is an important task in order to understand the potential of the data provided by these equipment and to select the most appropriate according to the user applications and requirements. In this context, open and permanent test fields are very significant and helpful experimental environment, since they provide a comparative data for different platforms, sensors and users, allowing multi-temporal analyses as well. Objective of this work was to investigate the feasibility of an open permanent test field in context of precision agriculture. Satellite (Sentinel-2), aircraft and drones with hyperspectral and RGB cameras were assessed in this study to estimate biomass, using linear regression models and in-situ samples. Spectral data and 3D information were used and compared in different combinations to investigate the quality of the models. The biomass estimation accuracies using linear regression models were better than 90 % for the drone based datasets. The results showed that the use of spectral and 3D features together improved the estimation model. However, estimation of nitrogen content was less accurate with the evaluated remote sensing sensors. The open and permanent test field showed to be suitable to provide an accurate and reliable reference data for the commercial users and farmers.
VoxelStats: A MATLAB Package for Multi-Modal Voxel-Wise Brain Image Analysis.

PubMed

Mathotaarachchi, Sulantha; Wang, Seqian; Shin, Monica; Pascoal, Tharick A; Benedet, Andrea L; Kang, Min Su; Beaudry, Thomas; Fonov, Vladimir S; Gauthier, Serge; Labbe, Aurélie; Rosa-Neto, Pedro

2016-01-01

In healthy individuals, behavioral outcomes are highly associated with the variability on brain regional structure or neurochemical phenotypes. Similarly, in the context of neurodegenerative conditions, neuroimaging reveals that cognitive decline is linked to the magnitude of atrophy, neurochemical declines, or concentrations of abnormal protein aggregates across brain regions. However, modeling the effects of multiple regional abnormalities as determinants of cognitive decline at the voxel level remains largely unexplored by multimodal imaging research, given the high computational cost of estimating regression models for every single voxel from various imaging modalities. VoxelStats is a voxel-wise computational framework to overcome these computational limitations and to perform statistical operations on multiple scalar variables and imaging modalities at the voxel level. VoxelStats package has been developed in Matlab(®) and supports imaging formats such as Nifti-1, ANALYZE, and MINC v2. Prebuilt functions in VoxelStats enable the user to perform voxel-wise general and generalized linear models and mixed effect models with multiple volumetric covariates. Importantly, VoxelStats can recognize scalar values or image volumes as response variables and can accommodate volumetric statistical covariates as well as their interaction effects with other variables. Furthermore, this package includes built-in functionality to perform voxel-wise receiver operating characteristic analysis and paired and unpaired group contrast analysis. Validation of VoxelStats was conducted by comparing the linear regression functionality with existing toolboxes such as glim_image and RMINC. The validation results were identical to existing methods and the additional functionality was demonstrated by generating feature case assessments (t-statistics, odds ratio, and true positive rate maps). In summary, VoxelStats expands the current methods for multimodal imaging analysis by allowing the estimation of advanced regional association metrics at the voxel level.
TG study of the Li0.4Fe2.4Zn0.2O4 ferrite synthesis

NASA Astrophysics Data System (ADS)

Lysenko, E. N.; Nikolaev, E. V.; Surzhikov, A. P.

2016-02-01

In this paper, the kinetic analysis of Li-Zn ferrite synthesis was studied using thermogravimetry (TG) method through the simultaneous application of non-linear regression to several measurements run at different heating rates (multivariate non-linear regression). Using TG-curves obtained for the four heating rates and Netzsch Thermokinetics software package, the kinetic models with minimal adjustable parameters were selected to quantitatively describe the reaction of Li-Zn ferrite synthesis. It was shown that the experimental TG-curves clearly suggest a two-step process for the ferrite synthesis and therefore a model-fitting kinetic analysis based on multivariate non-linear regressions was conducted. The complex reaction was described by a two-step reaction scheme consisting of sequential reaction steps. It is established that the best results were obtained using the Yander three-dimensional diffusion model at the first stage and Ginstling-Bronstein model at the second step. The kinetic parameters for lithium-zinc ferrite synthesis reaction were found and discussed.
Modeling thermal sensation in a Mediterranean climate—a comparison of linear and ordinal models

NASA Astrophysics Data System (ADS)

Pantavou, Katerina; Lykoudis, Spyridon

2014-08-01

A simple thermo-physiological model of outdoor thermal sensation adjusted with psychological factors is developed aiming to predict thermal sensation in Mediterranean climates. Microclimatic measurements simultaneously with interviews on personal and psychological conditions were carried out in a square, a street canyon and a coastal location of the greater urban area of Athens, Greece. Multiple linear and ordinal regression were applied in order to estimate thermal sensation making allowance for all the recorded parameters or specific, empirically selected, subsets producing so-called extensive and empirical models, respectively. Meteorological, thermo-physiological and overall models - considering psychological factors as well - were developed. Predictions were improved when personal and psychological factors were taken into account as compared to meteorological models. The model based on ordinal regression reproduced extreme values of thermal sensation vote more adequately than the linear regression one, while the empirical model produced satisfactory results in relation to the extensive model. The effects of adaptation and expectation on thermal sensation vote were introduced in the models by means of the exposure time, season and preference related to air temperature and irradiation. The assessment of thermal sensation could be a useful criterion in decision making regarding public health, outdoor spaces planning and tourism.
The use of experimental design in the development of an HPLC-ECD method for the analysis of captopril.

PubMed

Khamanga, Sandile M; Walker, Roderick B

2011-01-15

An accurate, sensitive and specific high performance liquid chromatography-electrochemical detection (HPLC-ECD) method that was developed and validated for captopril (CPT) is presented. Separation was achieved using a Phenomenex(®) Luna 5 μm (C(18)) column and a mobile phase comprised of phosphate buffer (adjusted to pH 3.0): acetonitrile in a ratio of 70:30 (v/v). Detection was accomplished using a full scan multi channel ESA Coulometric detector in the "oxidative-screen" mode with the upstream electrode (E(1)) set at +600 mV and the downstream (analytical) electrode (E(2)) set at +950 mV, while the potential of the guard cell was maintained at +1050 mV. The detector gain was set at 300. Experimental design using central composite design (CCD) was used to facilitate method development. Mobile phase pH, molarity and concentration of acetonitrile (ACN) were considered the critical factors to be studied to establish the retention time of CPT and cyclizine (CYC) that was used as the internal standard. Twenty experiments including centre points were undertaken and a quadratic model was derived for the retention time for CPT using the experimental data. The method was validated for linearity, accuracy, precision, limits of quantitation and detection, as per the ICH guidelines. The system was found to produce sharp and well-resolved peaks for CPT and CYC with retention times of 3.08 and 7.56 min, respectively. Linear regression analysis for the calibration curve showed a good linear relationship with a regression coefficient of 0.978 in the concentration range of 2-70 μg/mL. The linear regression equation was y=0.0131x+0.0275. The limits of detection (LOQ) and quantitation (LOD) were found to be 2.27 and 0.6 μg/mL, respectively. The method was used to analyze CPT in tablets. The wide range for linearity, accuracy, sensitivity, short retention time and composition of the mobile phase indicated that this method is better for the quantification of CPT than the pharmacopoeial methods. Copyright © 2010 Elsevier B.V. All rights reserved.
Distributed Monitoring of the R(sup 2) Statistic for Linear Regression

NASA Technical Reports Server (NTRS)

Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.

2011-01-01

The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
Application of Multi-task Lasso Regression in the Stellar Parametrization

NASA Astrophysics Data System (ADS)

Chang, L. N.; Zhang, P. A.

2015-01-01

The multi-task learning approaches have attracted the increasing attention in the fields of machine learning, computer vision, and artificial intelligence. By utilizing the correlations in tasks, learning multiple related tasks simultaneously is better than learning each task independently. An efficient multi-task Lasso (Least Absolute Shrinkage Selection and Operator) regression algorithm is proposed in this paper to estimate the physical parameters of stellar spectra. It not only makes different physical parameters share the common features, but also can effectively preserve their own peculiar features. Experiments were done based on the ELODIE data simulated with the stellar atmospheric simulation model, and on the SDSS data released by the American large survey Sloan. The precision of the model is better than those of the methods in the related literature, especially for the acceleration of gravity (lg g) and the chemical abundance ([Fe/H]). In the experiments, we changed the resolution of the spectrum, and applied the noises with different signal-to-noise ratio (SNR) to the spectrum, so as to illustrate the stability of the model. The results show that the model is influenced by both the resolution and the noise. But the influence of the noise is larger than that of the resolution. In general, the multi-task Lasso regression algorithm is easy to operate, has a strong stability, and also can improve the overall accuracy of the model.
SU-F-J-138: An Extension of PCA-Based Respiratory Deformation Modeling Via Multi-Linear Decomposition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Iliopoulos, AS; Sun, X; Pitsianis, N

Purpose: To address and lift the limited degree of freedom (DoF) of globally bilinear motion components such as those based on principal components analysis (PCA), for encoding and modeling volumetric deformation motion. Methods: We provide a systematic approach to obtaining a multi-linear decomposition (MLD) and associated motion model from deformation vector field (DVF) data. We had previously introduced MLD for capturing multi-way relationships between DVF variables, without being restricted by the bilinear component format of PCA-based models. PCA-based modeling is commonly used for encoding patient-specific deformation as per planning 4D-CT images, and aiding on-board motion estimation during radiotherapy. However, themore » bilinear space-time decomposition inherently limits the DoF of such models by the small number of respiratory phases. While this limit is not reached in model studies using analytical or digital phantoms with low-rank motion, it compromises modeling power in the presence of relative motion, asymmetries and hysteresis, etc, which are often observed in patient data. Specifically, a low-DoF model will spuriously couple incoherent motion components, compromising its adaptability to on-board deformation changes. By the multi-linear format of extracted motion components, MLD-based models can encode higher-DoF deformation structure. Results: We conduct mathematical and experimental comparisons between PCA- and MLD-based models. A set of temporally-sampled analytical trajectories provides a synthetic, high-rank DVF; trajectories correspond to respiratory and cardiac motion factors, including different relative frequencies and spatial variations. Additionally, a digital XCAT phantom is used to simulate a lung lesion deforming incoherently with respect to the body, which adheres to a simple respiratory trend. In both cases, coupling of incoherent motion components due to a low model DoF is clearly demonstrated. Conclusion: Multi-linear decomposition can enable decoupling of distinct motion factors in high-rank DVF measurements. This may improve motion model expressiveness and adaptability to on-board deformation, aiding model-based image reconstruction for target verification. NIH Grant No. R01-184173.« less
The novel application of artificial neural network on bioelectrical impedance analysis to assess the body composition in elderly

PubMed Central

2013-01-01

Background This study aims to improve accuracy of Bioelectrical Impedance Analysis (BIA) prediction equations for estimating fat free mass (FFM) of the elderly by using non-linear Back Propagation Artificial Neural Network (BP-ANN) model and to compare the predictive accuracy with the linear regression model by using energy dual X-ray absorptiometry (DXA) as reference method. Methods A total of 88 Taiwanese elderly adults were recruited in this study as subjects. Linear regression equations and BP-ANN prediction equation were developed using impedances and other anthropometrics for predicting the reference FFM measured by DXA (FFMDXA) in 36 male and 26 female Taiwanese elderly adults. The FFM estimated by BIA prediction equations using traditional linear regression model (FFMLR) and BP-ANN model (FFMANN) were compared to the FFMDXA. The measuring results of an additional 26 elderly adults were used to validate than accuracy of the predictive models. Results The results showed the significant predictors were impedance, gender, age, height and weight in developed FFMLR linear model (LR) for predicting FFM (coefficient of determination, r2 = 0.940; standard error of estimate (SEE) = 2.729 kg; root mean square error (RMSE) = 2.571kg, P < 0.001). The above predictors were set as the variables of the input layer by using five neurons in the BP-ANN model (r2 = 0.987 with a SD = 1.192 kg and relatively lower RMSE = 1.183 kg), which had greater (improved) accuracy for estimating FFM when compared with linear model. The results showed a better agreement existed between FFMANN and FFMDXA than that between FFMLR and FFMDXA. Conclusion When compared the performance of developed prediction equations for estimating reference FFMDXA, the linear model has lower r2 with a larger SD in predictive results than that of BP-ANN model, which indicated ANN model is more suitable for estimating FFM. PMID:23388042
A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.

PubMed

Kilian, Reinhold; Matschinger, Herbert; Löeffler, Walter; Roick, Christiane; Angermeyer, Matthias C

2002-03-01

Transformation of the dependent cost variable is often used to solve the problems of heteroscedasticity and skewness in linear ordinary least square regression of health service cost data. However, transformation may cause difficulties in the interpretation of regression coefficients and the retransformation of predicted values. The study compares the advantages and disadvantages of different methods to estimate regression based cost functions using data on the annual costs of schizophrenia treatment. Annual costs of psychiatric service use and clinical and socio-demographic characteristics of the patients were assessed for a sample of 254 patients with a diagnosis of schizophrenia (ICD-10 F 20.0) living in Leipzig. The clinical characteristics of the participants were assessed by means of the BPRS 4.0, the GAF, and the CAN for service needs. Quality of life was measured by WHOQOL-BREF. A linear OLS regression model with non-parametric standard errors, a log-transformed OLS model and a generalized linear model with a log-link and a gamma distribution were used to estimate service costs. For the estimation of robust non-parametric standard errors, the variance estimator by White and a bootstrap estimator based on 2000 replications were employed. Models were evaluated by the comparison of the R2 and the root mean squared error (RMSE). RMSE of the log-transformed OLS model was computed with three different methods of bias-correction. The 95% confidence intervals for the differences between the RMSE were computed by means of bootstrapping. A split-sample-cross-validation procedure was used to forecast the costs for the one half of the sample on the basis of a regression equation computed for the other half of the sample. All three methods showed significant positive influences of psychiatric symptoms and met psychiatric service needs on service costs. Only the log- transformed OLS model showed a significant negative impact of age, and only the GLM shows a significant negative influences of employment status and partnership on costs. All three models provided a R2 of about.31. The Residuals of the linear OLS model revealed significant deviances from normality and homoscedasticity. The residuals of the log-transformed model are normally distributed but still heteroscedastic. The linear OLS model provided the lowest prediction error and the best forecast of the dependent cost variable. The log-transformed model provided the lowest RMSE if the heteroscedastic bias correction was used. The RMSE of the GLM with a log link and a gamma distribution was higher than those of the linear OLS model and the log-transformed OLS model. The difference between the RMSE of the linear OLS model and that of the log-transformed OLS model without bias correction was significant at the 95% level. As result of the cross-validation procedure, the linear OLS model provided the lowest RMSE followed by the log-transformed OLS model with a heteroscedastic bias correction. The GLM showed the weakest model fit again. None of the differences between the RMSE resulting form the cross- validation procedure were found to be significant. The comparison of the fit indices of the different regression models revealed that the linear OLS model provided a better fit than the log-transformed model and the GLM, but the differences between the models RMSE were not significant. Due to the small number of cases in the study the lack of significance does not sufficiently proof that the differences between the RSME for the different models are zero and the superiority of the linear OLS model can not be generalized. The lack of significant differences among the alternative estimators may reflect a lack of sample size adequate to detect important differences among the estimators employed. Further studies with larger case number are necessary to confirm the results. Specification of an adequate regression models requires a careful examination of the characteristics of the data. Estimation of standard errors and confidence intervals by nonparametric methods which are robust against deviations from the normal distribution and the homoscedasticity of residuals are suitable alternatives to the transformation of the skew distributed dependent variable. Further studies with more adequate case numbers are needed to confirm the results.
A simple approach to power and sample size calculations in logistic regression and Cox regression models.

PubMed

Vaeth, Michael; Skovlund, Eva

2004-06-15

For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
EMD-regression for modelling multi-scale relationships, and application to weather-related cardiovascular mortality

NASA Astrophysics Data System (ADS)

Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.

2018-01-01

In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.
Local polynomial estimation of heteroscedasticity in a multivariate linear regression model and its applications in economics.

PubMed

Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan

2012-01-01

Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification

NASA Astrophysics Data System (ADS)

Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen

2017-12-01

Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
A systematic study on the influencing parameters and improvement of quantitative analysis of multi-component with single marker method using notoginseng as research subject.

PubMed

Wang, Chao-Qun; Jia, Xiu-Hong; Zhu, Shu; Komatsu, Katsuko; Wang, Xuan; Cai, Shao-Qing

2015-03-01

A new quantitative analysis of multi-component with single marker (QAMS) method for 11 saponins (ginsenosides Rg1, Rb1, Rg2, Rh1, Rf, Re and Rd; notoginsenosides R1, R4, Fa and K) in notoginseng was established, when 6 of these saponins were individually used as internal referring substances to investigate the influences of chemical structure, concentrations of quantitative components, and purities of the standard substances on the accuracy of the QAMS method. The results showed that the concentration of the analyte in sample solution was the major influencing parameter, whereas the other parameters had minimal influence on the accuracy of the QAMS method. A new method for calculating the relative correction factors by linear regression was established (linear regression method), which demonstrated to decrease standard method differences of the QAMS method from 1.20%±0.02% - 23.29%±3.23% to 0.10%±0.09% - 8.84%±2.85% in comparison with the previous method. And the differences between external standard method and the QAMS method using relative correction factors calculated by linear regression method were below 5% in the quantitative determination of Rg1, Re, R1, Rd and Fa in 24 notoginseng samples and Rb1 in 21 notoginseng samples. And the differences were mostly below 10% in the quantitative determination of Rf, Rg2, R4 and N-K (the differences of these 4 constituents bigger because their contents lower) in all the 24 notoginseng samples. The results indicated that the contents assayed by the new QAMS method could be considered as accurate as those assayed by external standard method. In addition, a method for determining applicable concentration ranges of the quantitative components assayed by QAMS method was established for the first time, which could ensure its high accuracy and could be applied to QAMS methods of other TCMs. The present study demonstrated the practicability of the application of the QAMS method for the quantitative analysis of multi-component and the quality control of TCMs and TCM prescriptions. Copyright © 2014 Elsevier B.V. All rights reserved.
Multishaker modal testing

NASA Technical Reports Server (NTRS)

Craig, Roy R., Jr.

1987-01-01

The major accomplishments of this research are: (1) the refinement and documentation of a multi-input, multi-output modal parameter estimation algorithm which is applicable to general linear, time-invariant dynamic systems; (2) the development and testing of an unsymmetric block-Lanzcos algorithm for reduced-order modeling of linear systems with arbitrary damping; and (3) the development of a control-structure-interaction (CSI) test facility.
Modeling containment of large wildfires using generalized linear mixed-model analysis

Treesearch

Mark Finney; Isaac C. Grenfell; Charles W. McHugh

2009-01-01

Billions of dollars are spent annually in the United States to contain large wildland fires, but the factors contributing to suppression success remain poorly understood. We used a regression model (generalized linear mixed-model) to model containment probability of individual fires, assuming that containment was a repeated-measures problem (fixed effect) and...

Applying Regression Analysis to Problems in Institutional Research.

ERIC Educational Resources Information Center

Bohannon, Tom R.

1988-01-01

Regression analysis is one of the most frequently used statistical techniques in institutional research. Principles of least squares, model building, residual analysis, influence statistics, and multi-collinearity are described and illustrated. (Author/MSE)
A Demonstration of Regression False Positive Selection in Data Mining

ERIC Educational Resources Information Center

Pinder, Jonathan P.

2014-01-01

Business analytics courses, such as marketing research, data mining, forecasting, and advanced financial modeling, have substantial predictive modeling components. The predictive modeling in these courses requires students to estimate and test many linear regressions. As a result, false positive variable selection ("type I errors") is…
Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression

NASA Astrophysics Data System (ADS)

Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.

2013-02-01

Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.
The Research of Regression Method for Forecasting Monthly Electricity Sales Considering Coupled Multi-factor

NASA Astrophysics Data System (ADS)

Wang, Jiangbo; Liu, Junhui; Li, Tiantian; Yin, Shuo; He, Xinhui

2018-01-01

The monthly electricity sales forecasting is a basic work to ensure the safety of the power system. This paper presented a monthly electricity sales forecasting method which comprehensively considers the coupled multi-factors of temperature, economic growth, electric power replacement and business expansion. The mathematical model is constructed by using regression method. The simulation results show that the proposed method is accurate and effective.
Bayesian multivariate hierarchical transformation models for ROC analysis.

PubMed

O'Malley, A James; Zou, Kelly H

2006-02-15

A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.
Bayesian multivariate hierarchical transformation models for ROC analysis

PubMed Central

O'Malley, A. James; Zou, Kelly H.

2006-01-01

SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836
A non-linear optimization programming model for air quality planning including co-benefits for GHG emissions.

PubMed

Turrini, Enrico; Carnevale, Claudio; Finzi, Giovanna; Volta, Marialuisa

2018-04-15

This paper introduces the MAQ (Multi-dimensional Air Quality) model aimed at defining cost-effective air quality plans at different scales (urban to national) and assessing the co-benefits for GHG emissions. The model implements and solves a non-linear multi-objective, multi-pollutant decision problem where the decision variables are the application levels of emission abatement measures allowing the reduction of energy consumption, end-of pipe technologies and fuel switch options. The objectives of the decision problem are the minimization of tropospheric secondary pollution exposure and of internal costs. The model assesses CO 2 equivalent emissions in order to support decision makers in the selection of win-win policies. The methodology is tested on Lombardy region, a heavily polluted area in northern Italy. Copyright © 2017 Elsevier B.V. All rights reserved.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.

PubMed

Kong, Shengchun; Nan, Bin

2014-01-01

We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso

PubMed Central

Kong, Shengchun; Nan, Bin

2013-01-01

We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses. PMID:24516328
Introduction to the use of regression models in epidemiology.

PubMed

Bender, Ralf

2009-01-01

Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
A model for prediction of color change after tooth bleaching based on CIELAB color space

NASA Astrophysics Data System (ADS)

Herrera, Luis J.; Santana, Janiley; Yebra, Ana; Rivas, María. José; Pulgar, Rosa; Pérez, María. M.

2017-08-01

An experimental study aiming to develop a model based on CIELAB color space for prediction of color change after a tooth bleaching procedure is presented. Multivariate linear regression models were obtained to predict the L*, a*, b* and W* post-bleaching values using the pre-bleaching L*, a*and b*values. Moreover, univariate linear regression models were obtained to predict the variation in chroma (C*), hue angle (h°) and W*. The results demonstrated that is possible to estimate color change when using a carbamide peroxide tooth-bleaching system. The models obtained can be applied in clinic to predict the colour change after bleaching.
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models

PubMed Central

Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A.; Burgueño, Juan; Pérez-Rodríguez, Paulino; de los Campos, Gustavo

2016-01-01

The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects (u) that can be assessed by the Kronecker product of variance–covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model (u) plus an extra component, f, that captures random effects between environments that were not captured by the random effects u. We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with u and f over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect u. PMID:27793970
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models.

PubMed

Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A; Burgueño, Juan; Pérez-Rodríguez, Paulino; de Los Campos, Gustavo

2017-01-05

The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects [Formula: see text] that can be assessed by the Kronecker product of variance-covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model [Formula: see text] plus an extra component, F: , that captures random effects between environments that were not captured by the random effects [Formula: see text] We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with [Formula: see text] over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect [Formula: see text]. Copyright © 2017 Cuevas et al.
Predicting multi-level drug response with gene expression profile in multiple myeloma using hierarchical ordinal regression.

PubMed

Zhang, Xinyan; Li, Bingzong; Han, Huiying; Song, Sha; Xu, Hongxia; Hong, Yating; Yi, Nengjun; Zhuang, Wenzhuo

2018-05-10

Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.
Multi-objective Optimization of Pulsed Gas Metal Arc Welding Process Using Neuro NSGA-II

NASA Astrophysics Data System (ADS)

Pal, Kamal; Pal, Surjya K.

2018-05-01

Weld quality is a critical issue in fabrication industries where products are custom-designed. Multi-objective optimization results number of solutions in the pareto-optimal front. Mathematical regression model based optimization methods are often found to be inadequate for highly non-linear arc welding processes. Thus, various global evolutionary approaches like artificial neural network, genetic algorithm (GA) have been developed. The present work attempts with elitist non-dominated sorting GA (NSGA-II) for optimization of pulsed gas metal arc welding process using back propagation neural network (BPNN) based weld quality feature models. The primary objective to maintain butt joint weld quality is the maximization of tensile strength with minimum plate distortion. BPNN has been used to compute the fitness of each solution after adequate training, whereas NSGA-II algorithm generates the optimum solutions for two conflicting objectives. Welding experiments have been conducted on low carbon steel using response surface methodology. The pareto-optimal front with three ranked solutions after 20th generations was considered as the best without further improvement. The joint strength as well as transverse shrinkage was found to be drastically improved over the design of experimental results as per validated pareto-optimal solutions obtained.
Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States

NASA Astrophysics Data System (ADS)

Yang, J.; Astitha, M.; Schwartz, C. S.

2017-12-01

Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.
Instrumental intelligent test of food sensory quality as mimic of human panel test combining multiple cross-perception sensors and data fusion.

PubMed

Ouyang, Qin; Zhao, Jiewen; Chen, Quansheng

2014-09-02

Instrumental test of food quality using perception sensors instead of human panel test is attracting massive attention recently. A novel cross-perception multi-sensors data fusion imitating multiple mammal perception was proposed for the instrumental test in this work. First, three mimic sensors of electronic eye, electronic nose and electronic tongue were used in sequence for data acquisition of rice wine samples. Then all data from the three different sensors were preprocessed and merged. Next, three cross-perception variables i.e., color, aroma and taste, were constructed using principal components analysis (PCA) and multiple linear regression (MLR) which were used as the input of models. MLR, back-propagation artificial neural network (BPANN) and support vector machine (SVM) were comparatively used for modeling, and the instrumental test was achieved for the comprehensive quality of samples. Results showed the proposed cross-perception multi-sensors data fusion presented obvious superiority to the traditional data fusion methodologies, also achieved a high correlation coefficient (>90%) with the human panel test results. This work demonstrated that the instrumental test based on the cross-perception multi-sensors data fusion can actually mimic the human test behavior, therefore is of great significance to ensure the quality of products and decrease the loss of the manufacturers. Copyright © 2014 Elsevier B.V. All rights reserved.
Modeling maximum daily temperature using a varying coefficient regression model

Treesearch

Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith

2014-01-01

Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...
Detection of crossover time scales in multifractal detrended fluctuation analysis

NASA Astrophysics Data System (ADS)

Ge, Erjia; Leung, Yee

2013-04-01

Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.
Estimation of aboveground biomass in Mediterranean forests by statistical modelling of ASTER fraction images

NASA Astrophysics Data System (ADS)

Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.

2014-09-01

Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.

Multi-fidelity Gaussian process regression for prediction of random fields

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parussini, L.; Venturi, D., E-mail: venturi@ucsc.edu; Perdikaris, P.

We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgersmore » equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.« less
Carbon dioxide stripping in aquaculture -- part III: model verification

USGS Publications Warehouse

Colt, John; Watten, Barnaby; Pfeiffer, Tim

2012-01-01

Based on conventional mass transfer models developed for oxygen, the use of the non-linear ASCE method, 2-point method, and one parameter linear-regression method were evaluated for carbon dioxide stripping data. For values of KLaCO2 < approximately 1.5/h, the 2-point or ASCE method are a good fit to experimental data, but the fit breaks down at higher values of KLaCO2. How to correct KLaCO2 for gas phase enrichment remains to be determined. The one-parameter linear regression model was used to vary the C*CO2 over the test, but it did not result in a better fit to the experimental data when compared to the ASCE or fixed C*CO2 assumptions.
Post-processing through linear regression

NASA Astrophysics Data System (ADS)

van Schaeybroeck, B.; Vannitsem, S.

2011-03-01

Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Construction of multiple linear regression models using blood biomarkers for selecting against abdominal fat traits in broilers.

PubMed

Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H

2018-01-01

Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P < 0.01), and correlated significantly with AF traits (P < 0.05). The best multiple linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.
Does Nonlinear Modeling Play a Role in Plasmid Bioprocess Monitoring Using Fourier Transform Infrared Spectra?

PubMed

Lopes, Marta B; Calado, Cecília R C; Figueiredo, Mário A T; Bioucas-Dias, José M

2017-06-01

The monitoring of biopharmaceutical products using Fourier transform infrared (FT-IR) spectroscopy relies on calibration techniques involving the acquisition of spectra of bioprocess samples along the process. The most commonly used method for that purpose is partial least squares (PLS) regression, under the assumption that a linear model is valid. Despite being successful in the presence of small nonlinearities, linear methods may fail in the presence of strong nonlinearities. This paper studies the potential usefulness of nonlinear regression methods for predicting, from in situ near-infrared (NIR) and mid-infrared (MIR) spectra acquired in high-throughput mode, biomass and plasmid concentrations in Escherichia coli DH5-α cultures producing the plasmid model pVAX-LacZ. The linear methods PLS and ridge regression (RR) are compared with their kernel (nonlinear) versions, kPLS and kRR, as well as with the (also nonlinear) relevance vector machine (RVM) and Gaussian process regression (GPR). For the systems studied, RR provided better predictive performances compared to the remaining methods. Moreover, the results point to further investigation based on larger data sets whenever differences in predictive accuracy between a linear method and its kernelized version could not be found. The use of nonlinear methods, however, shall be judged regarding the additional computational cost required to tune their additional parameters, especially when the less computationally demanding linear methods herein studied are able to successfully monitor the variables under study.
Solving a mixture of many random linear equations by tensor decomposition and alternating minimization.

DOT National Transportation Integrated Search

2016-09-01

We consider the problem of solving mixed random linear equations with k components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample...
Walk Score® and Transit Score® and Walking in the Multi-Ethnic Study of Atherosclerosis

PubMed Central

Hirsch, Jana A.; Moore, Kari A.; Evenson, Kelly R.; Rodriguez, Daniel A; Diez Roux, Ana V.

2013-01-01

Background Walk Score® and Transit Score® are open-source measures of the neighborhood built environment to support walking (“walkability”) and access to transportation. Purpose To investigate associations of Street Smart Walk Score and Transit Score with self-reported transport and leisure walking using data from a large multi-city and diverse population-based sample of adults. Methods Data from a sample of 4552 residents of Baltimore MD; Chicago IL; Forsyth County NC; Los Angeles CA; New York NY; and St. Paul MN from the Multi-Ethnic Study of Atherosclerosis (2010–2012) were linked to Walk Score and Transit Score (collected in 2012). Logistic and linear regression models estimated ORs of not walking and mean differences in minutes walked, respectively, associated with continuous and categoric Walk Score and Transit Score. All analyses were conducted in 2012. Results After adjustment for site, key sociodemographic, and health variables, a higher Walk Score was associated with lower odds of not walking for transport and more minutes/week of transport walking. Compared to those in a “walker’s paradise,” lower categories of Walk Score were associated with a linear increase in odds of not transport walking and a decline in minutes of leisure walking. An increase in Transit Score was associated with lower odds of not transport walking or leisure walking, and additional minutes/week of leisure walking. Conclusions Walk Score and Transit Score appear to be useful as measures of walkability in analyses of neighborhood effects. PMID:23867022
Stochastic and Statistical Analysis of Utility Revenues and Weather Data Analysis for Consumer Demand Estimation in Smart Grids

PubMed Central

Ali, S. M.; Mehmood, C. A; Khan, B.; Jawad, M.; Farid, U; Jadoon, J. K.; Ali, M.; Tareen, N. K.; Usman, S.; Majid, M.; Anwar, S. M.

2016-01-01

In smart grid paradigm, the consumer demands are random and time-dependent, owning towards stochastic probabilities. The stochastically varying consumer demands have put the policy makers and supplying agencies in a demanding position for optimal generation management. The utility revenue functions are highly dependent on the consumer deterministic stochastic demand models. The sudden drifts in weather parameters effects the living standards of the consumers that in turn influence the power demands. Considering above, we analyzed stochastically and statistically the effect of random consumer demands on the fixed and variable revenues of the electrical utilities. Our work presented the Multi-Variate Gaussian Distribution Function (MVGDF) probabilistic model of the utility revenues with time-dependent consumer random demands. Moreover, the Gaussian probabilities outcome of the utility revenues is based on the varying consumer n demands data-pattern. Furthermore, Standard Monte Carlo (SMC) simulations are performed that validated the factor of accuracy in the aforesaid probabilistic demand-revenue model. We critically analyzed the effect of weather data parameters on consumer demands using correlation and multi-linear regression schemes. The statistical analysis of consumer demands provided a relationship between dependent (demand) and independent variables (weather data) for utility load management, generation control, and network expansion. PMID:27314229
Stochastic and Statistical Analysis of Utility Revenues and Weather Data Analysis for Consumer Demand Estimation in Smart Grids.

PubMed

Ali, S M; Mehmood, C A; Khan, B; Jawad, M; Farid, U; Jadoon, J K; Ali, M; Tareen, N K; Usman, S; Majid, M; Anwar, S M

2016-01-01

In smart grid paradigm, the consumer demands are random and time-dependent, owning towards stochastic probabilities. The stochastically varying consumer demands have put the policy makers and supplying agencies in a demanding position for optimal generation management. The utility revenue functions are highly dependent on the consumer deterministic stochastic demand models. The sudden drifts in weather parameters effects the living standards of the consumers that in turn influence the power demands. Considering above, we analyzed stochastically and statistically the effect of random consumer demands on the fixed and variable revenues of the electrical utilities. Our work presented the Multi-Variate Gaussian Distribution Function (MVGDF) probabilistic model of the utility revenues with time-dependent consumer random demands. Moreover, the Gaussian probabilities outcome of the utility revenues is based on the varying consumer n demands data-pattern. Furthermore, Standard Monte Carlo (SMC) simulations are performed that validated the factor of accuracy in the aforesaid probabilistic demand-revenue model. We critically analyzed the effect of weather data parameters on consumer demands using correlation and multi-linear regression schemes. The statistical analysis of consumer demands provided a relationship between dependent (demand) and independent variables (weather data) for utility load management, generation control, and network expansion.
Multi-objective possibilistic model for portfolio selection with transaction cost

NASA Astrophysics Data System (ADS)

Jana, P.; Roy, T. K.; Mazumder, S. K.

2009-06-01

In this paper, we introduce the possibilistic mean value and variance of continuous distribution, rather than probability distributions. We propose a multi-objective Portfolio based model and added another entropy objective function to generate a well diversified asset portfolio within optimal asset allocation. For quantifying any potential return and risk, portfolio liquidity is taken into account and a multi-objective non-linear programming model for portfolio rebalancing with transaction cost is proposed. The models are illustrated with numerical examples.
Avoiding Boundary Estimates in Hierarchical Linear Models through Weakly Informative Priors

ERIC Educational Resources Information Center

Chung, Yeojin; Rabe-Hesketh, Sophia; Gelman, Andrew; Dorie, Vincent; Liu, Jinchen

2012-01-01

Hierarchical or multilevel linear models are widely used for longitudinal or cross-sectional data on students nested in classes and schools, and are particularly important for estimating treatment effects in cluster-randomized trials, multi-site trials, and meta-analyses. The models can allow for variation in treatment effects, as well as…
Adjusted variable plots for Cox's proportional hazards regression model.

PubMed

Hall, C B; Zeger, S L; Bandeen-Roche, K J

1996-01-01

Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.
Selection of a Geostatistical Method to Interpolate Soil Properties of the State Crop Testing Fields using Attributes of a Digital Terrain Model

NASA Astrophysics Data System (ADS)

Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.

2018-03-01

The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.
A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

NASA Astrophysics Data System (ADS)

Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

2015-05-01

The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
Low-rank regularization for learning gene expression programs.

PubMed

Ye, Guibo; Tang, Mengfan; Cai, Jian-Feng; Nie, Qing; Xie, Xiaohui

2013-01-01

Learning gene expression programs directly from a set of observations is challenging due to the complexity of gene regulation, high noise of experimental measurements, and insufficient number of experimental measurements. Imposing additional constraints with strong and biologically motivated regularizations is critical in developing reliable and effective algorithms for inferring gene expression programs. Here we propose a new form of regulation that constrains the number of independent connectivity patterns between regulators and targets, motivated by the modular design of gene regulatory programs and the belief that the total number of independent regulatory modules should be small. We formulate a multi-target linear regression framework to incorporate this type of regulation, in which the number of independent connectivity patterns is expressed as the rank of the connectivity matrix between regulators and targets. We then generalize the linear framework to nonlinear cases, and prove that the generalized low-rank regularization model is still convex. Efficient algorithms are derived to solve both the linear and nonlinear low-rank regularized problems. Finally, we test the algorithms on three gene expression datasets, and show that the low-rank regularization improves the accuracy of gene expression prediction in these three datasets.
Cost Estimation of Naval Ship Acquisition.

DTIC Science & Technology

1983-12-01

one a 9-sub- system model , the other a single total cost model . The models were developed using the linear least squares regression tech- nique with...to Linear Statistical Models , McGraw-Hill, 1961. 11. Helmer, F. T., Bibliography on Pricing Methodology and Cost Estimating, Dept. of Economics and...SUPPI.EMSaTARY NOTES IS. KWRo" (Cowaft. en tever aide of ..aesep M’ Idab~t 6 Week ONNa.) Cost estimation; Acquisition; Parametric cost estimate; linear
Understanding Child Stunting in India: A Comprehensive Analysis of Socio-Economic, Nutritional and Environmental Determinants Using Additive Quantile Regression

PubMed Central

Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.

2013-01-01

Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839
Understanding child stunting in India: a comprehensive analysis of socio-economic, nutritional and environmental determinants using additive quantile regression.

PubMed

Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A

2013-01-01

Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Mathematical Model for the Contribution of Individual Organs to Non-Zero Y-Intercepts in Single and Multi-Compartment Linear Models of Whole-Body Energy Expenditure

PubMed Central

Kaiyala, Karl J.

2014-01-01

Mathematical models for the dependence of energy expenditure (EE) on body mass and composition are essential tools in metabolic phenotyping. EE scales over broad ranges of body mass as a non-linear allometric function. When considered within restricted ranges of body mass, however, allometric EE curves exhibit ‘local linearity.’ Indeed, modern EE analysis makes extensive use of linear models. Such models typically involve one or two body mass compartments (e.g., fat free mass and fat mass). Importantly, linear EE models typically involve a non-zero (usually positive) y-intercept term of uncertain origin, a recurring theme in discussions of EE analysis and a source of confounding in traditional ratio-based EE normalization. Emerging linear model approaches quantify whole-body resting EE (REE) in terms of individual organ masses (e.g., liver, kidneys, heart, brain). Proponents of individual organ REE modeling hypothesize that multi-organ linear models may eliminate non-zero y-intercepts. This could have advantages in adjusting REE for body mass and composition. Studies reveal that individual organ REE is an allometric function of total body mass. I exploit first-order Taylor linearization of individual organ REEs to model the manner in which individual organs contribute to whole-body REE and to the non-zero y-intercept in linear REE models. The model predicts that REE analysis at the individual organ-tissue level will not eliminate intercept terms. I demonstrate that the parameters of a linear EE equation can be transformed into the parameters of the underlying ‘latent’ allometric equation. This permits estimates of the allometric scaling of EE in a diverse variety of physiological states that are not represented in the allometric EE literature but are well represented by published linear EE analyses. PMID:25068692
Poisson Mixture Regression Models for Heart Disease Prediction.

PubMed

Mufudza, Chipo; Erol, Hamza

2016-01-01

Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

Poisson Mixture Regression Models for Heart Disease Prediction

PubMed Central

Erol, Hamza

2016-01-01

Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Automated generation of quantum-accurate classical interatomic potentials for metals and semiconductors

NASA Astrophysics Data System (ADS)

Thompson, Aidan; Foiles, Stephen; Schultz, Peter; Swiler, Laura; Trott, Christian; Tucker, Garritt

2013-03-01

Molecular dynamics (MD) is a powerful condensed matter simulation tool for bridging between macroscopic continuum models and quantum models (QM) treating a few hundred atoms, but is limited by the accuracy of available interatomic potentials. Sound physical and chemical understanding of these interactions have resulted in a variety of concise potentials for certain systems, but it is difficult to extend them to new materials and properties. The growing availability of large QM data sets has made it possible to use more automated machine-learning approaches. Bartók et al. demonstrated that the bispectrum of the local neighbor density provides good regression surrogates for QM models. We adopt a similar bispectrum representation within a linear regression scheme. We have produced potentials for silicon and tantalum, and we are currently extending the method to III-V compounds. Results will be presented demonstrating the accuracy of these potentials relative to the training data, as well as their ability to accurately predict material properties not explicitly included in the training data. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Dept. of Energy Nat. Nuclear Security Admin. under Contract DE-AC04-94AL85000.
Developing Multi-model Ensemble for Precipitation and Temperature Seasonal Forecasts: Implications for Karkheh River Basin in Iran

NASA Astrophysics Data System (ADS)

Najafi, Husain; Massah Bavani, Ali Reza; Wanders, Niko; Wood, Eric; Irannejad, Parviz; Robertson, Andrew

2017-04-01

Water resource managers can utilize reliable seasonal forecasts for allocating water between different users within a water year. In the west of Iran where a decline of renewable water resources has been observed, basin-wide water management has been the subject of many inter-provincial conflicts in recent years. The problem is exacerbated when the environmental water requirements is not provided leaving the Hoor-al-Azim marshland in the downstream dry. It has been argued that information on total seasonal rainfall can support the Iranian Ministry of Energy within the water year. This study explores the skill of the North America Multi Model Ensemble for Karkheh River Basin in the of west Iran. NMME seasonal precipitation and temperature forecasts from eight models are evaluated against PERSIANN-CDR and Climate Research Unit (CRU) datasets. Analysis suggests that anomaly correlation for both precipitation and temperature is greater than 0.4 for all individual models. Lead time-dependent seasonal forecasts are improved when a multi-model ensemble is developed for the river basin using stepwise linear regression model. MME R-squared exceeds 0.6 for temperature for almost all initializations suggesting high skill of NMME in Karkheh river basin. The skill of MME for rainfall forecasts is high for 1-month lead time for October, February, March and October initializations. However, for months when the amount of rainfall accounts for a significant proportion of total annual rainfall, the skill of NMME is limited a month in advance. It is proposed that operational regional water companies incorporate NMME seasonal forecasts into water resource planning and management, especially during growing seasons that are essential for agricultural risk management.
Does transport time help explain the high trauma mortality rates in rural areas? New and traditional predictors assessed by new and traditional statistical methods

PubMed Central

Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas

2015-01-01

Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600
Modified Regression Correlation Coefficient for Poisson Regression Model

NASA Astrophysics Data System (ADS)

Kaengthong, Nattacha; Domthong, Uthumporn

2017-09-01

This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Quantitative MRI establishes the efficacy of PI3K inhibitor (GDC-0941) multi-treatments in PTEN-deficient mice lymphoma.

PubMed

Wullschleger, Stephan; García-Martínez, Juan M; Duce, Suzanne L

2012-02-01

To assess the efficacy of multiple treatment of phosphatidylinositol-3-kinase (PI3K) inhibitor on autochthonous tumours in phosphatase and tensin homologue (Pten)-deficient genetically engineered mouse cancer models using a longitudinal magnetic resonance imaging (MRI) protocol. Using 3D MRI, B-cell follicular lymphoma growth was quantified in a Pten(+/-)Lkb1(+/hypo) mouse line, before, during and after repeated treatments with a PI3K inhibitor GDC-0941 (75 mg/kg). Mean pre-treatment linear tumour growth rate was 16.5±12.8 mm(3)/week. Repeated 28-day GDC-0941 administration, with 21 days 'off-treatment', induced average tumour regression of 41±7%. Upon cessation of the second treatment (which was not permanently cytocidal), tumours re-grew with an average linear growth rate of 40.1±15.5 mm(3)/week. There was no evidence of chemoresistance. This protocol can accommodate complex dosing schedules, as well as combine different cancer therapies. It reduces biological variability problems and resulted in a 10-fold reduction in mouse numbers compared with terminal assessment methods. It is ideal for preclinical efficacy studies and for phenotyping molecularly characterized mouse models when investigating gene function.
Seasonal prediction of East Asian summer rainfall using a multi-model ensemble system

NASA Astrophysics Data System (ADS)

Ahn, Joong-Bae; Lee, Doo-Young; Yoo, Jin‑Ho

2015-04-01

Using the retrospective forecasts of seven state-of-the-art coupled models and their multi-model ensemble (MME) for boreal summers, the prediction skills of climate models in the western tropical Pacific (WTP) and East Asian region are assessed. The prediction of summer rainfall anomalies in East Asia is difficult, while the WTP has a strong correlation between model prediction and observation. We focus on developing a new approach to further enhance the seasonal prediction skill for summer rainfall in East Asia and investigate the influence of convective activity in the WTP on East Asian summer rainfall. By analyzing the characteristics of the WTP convection, two distinct patterns associated with El Niño-Southern Oscillation developing and decaying modes are identified. Based on the multiple linear regression method, the East Asia Rainfall Index (EARI) is developed by using the interannual variability of the normalized Maritime continent-WTP Indices (MPIs), as potentially useful predictors for rainfall prediction over East Asia, obtained from the above two main patterns. For East Asian summer rainfall, the EARI has superior performance to the East Asia summer monsoon index or each MPI. Therefore, the regressed rainfall from EARI also shows a strong relationship with the observed East Asian summer rainfall pattern. In addition, we evaluate the prediction skill of the East Asia reconstructed rainfall obtained by hybrid dynamical-statistical approach using the cross-validated EARI from the individual models and their MME. The results show that the rainfalls reconstructed from simulations capture the general features of observed precipitation in East Asia quite well. This study convincingly demonstrates that rainfall prediction skill is considerably improved by using a hybrid dynamical-statistical approach compared to the dynamical forecast alone. Acknowledgements This work was carried out with the support of Rural Development Administration Cooperative Research Program for Agriculture Science and Technology Development under grant project PJ009353 and Korea Meteorological Administration Research and Development Program under grant CATER 2012-3100, Republic of Korea.
Construction of mathematical model for measuring material concentration by colorimetric method

NASA Astrophysics Data System (ADS)

Liu, Bing; Gao, Lingceng; Yu, Kairong; Tan, Xianghua

2018-06-01

This paper use the method of multiple linear regression to discuss the data of C problem of mathematical modeling in 2017. First, we have established a regression model for the concentration of 5 substances. But only the regression model of the substance concentration of urea in milk can pass through the significance test. The regression model established by the second sets of data can pass the significance test. But this model exists serious multicollinearity. We have improved the model by principal component analysis. The improved model is used to control the system so that it is possible to measure the concentration of material by direct colorimetric method.
Assessing risk factors for periodontitis using regression

NASA Astrophysics Data System (ADS)

Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa

2013-10-01

Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
The Stability and Interfacial Motion of Multi-layer Radial Porous Media and Hele-Shaw Flows

NASA Astrophysics Data System (ADS)

Gin, Craig; Daripa, Prabir

2017-11-01

In this talk, we will discuss viscous fingering instabilities of multi-layer immiscible porous media flows within the Hele-Shaw model in a radial flow geometry. We study the motion of the interfaces for flows with both constant and variable viscosity fluids. We consider the effects of using a variable injection rate on multi-layer flows. We also present a numerical approach to simulating the interface motion within linear theory using the method of eigenfunction expansion. We compare these results with fully non-linear simulations.
Fatigue design of a cellular phone folder using regression model-based multi-objective optimization

NASA Astrophysics Data System (ADS)

Kim, Young Gyun; Lee, Jongsoo

2016-08-01

In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.
A comparative study of linear and nonlinear MIMO feedback configurations

NASA Technical Reports Server (NTRS)

Desoer, C. A.; Lin, C. A.

1984-01-01

In this paper, a comparison is conducted of several feedback configurations which have appeared in the literature (e.g. unity-feedback, model-reference, etc.). The linear time-invariant multi-input multi-output case is considered. For each configuration, the stability conditions are specified, the relation between achievable I/O maps and the achievable disturbance-to-output maps is examined, and the effect of various subsystem perturbations on the system performance is studied. In terms of these considerations, it is demonstrated that one of the configurations considered is better than all the others. The results are then extended to the nonlinear multi-input multi-output case.
Photometric redshift estimation based on data mining with PhotoRApToR

NASA Astrophysics Data System (ADS)

Cavuoti, S.; Brescia, M.; De Stefano, V.; Longo, G.

2015-03-01

Photometric redshifts (photo-z) are crucial to the scientific exploitation of modern panchromatic digital surveys. In this paper we present PhotoRApToR (Photometric Research Application To Redshift): a Java/C ++ based desktop application capable to solve non-linear regression and multi-variate classification problems, in particular specialized for photo-z estimation. It embeds a machine learning algorithm, namely a multi-layer neural network trained by the Quasi Newton learning rule, and special tools dedicated to pre- and post-processing data. PhotoRApToR has been successfully tested on several scientific cases. The application is available for free download from the DAME Program web site.
Forecasting daily patient volumes in the emergency department.

PubMed

Jones, Spencer S; Thomas, Alun; Evans, R Scott; Welch, Shari J; Haug, Peter J; Snow, Gregory L

2008-02-01

Shifts in the supply of and demand for emergency department (ED) resources make the efficient allocation of ED resources increasingly important. Forecasting is a vital activity that guides decision-making in many areas of economic, industrial, and scientific planning, but has gained little traction in the health care industry. There are few studies that explore the use of forecasting methods to predict patient volumes in the ED. The goals of this study are to explore and evaluate the use of several statistical forecasting methods to predict daily ED patient volumes at three diverse hospital EDs and to compare the accuracy of these methods to the accuracy of a previously proposed forecasting method. Daily patient arrivals at three hospital EDs were collected for the period January 1, 2005, through March 31, 2007. The authors evaluated the use of seasonal autoregressive integrated moving average, time series regression, exponential smoothing, and artificial neural network models to forecast daily patient volumes at each facility. Forecasts were made for horizons ranging from 1 to 30 days in advance. The forecast accuracy achieved by the various forecasting methods was compared to the forecast accuracy achieved when using a benchmark forecasting method already available in the emergency medicine literature. All time series methods considered in this analysis provided improved in-sample model goodness of fit. However, post-sample analysis revealed that time series regression models that augment linear regression models by accounting for serial autocorrelation offered only small improvements in terms of post-sample forecast accuracy, relative to multiple linear regression models, while seasonal autoregressive integrated moving average, exponential smoothing, and artificial neural network forecasting models did not provide consistently accurate forecasts of daily ED volumes. This study confirms the widely held belief that daily demand for ED services is characterized by seasonal and weekly patterns. The authors compared several time series forecasting methods to a benchmark multiple linear regression model. The results suggest that the existing methodology proposed in the literature, multiple linear regression based on calendar variables, is a reasonable approach to forecasting daily patient volumes in the ED. However, the authors conclude that regression-based models that incorporate calendar variables, account for site-specific special-day effects, and allow for residual autocorrelation provide a more appropriate, informative, and consistently accurate approach to forecasting daily ED patient volumes.
Simple, Efficient Estimators of Treatment Effects in Randomized Trials Using Generalized Linear Models to Leverage Baseline Variables

PubMed Central

Rosenblum, Michael; van der Laan, Mark J.

2010-01-01

Models, such as logistic regression and Poisson regression models, are often used to estimate treatment effects in randomized trials. These models leverage information in variables collected before randomization, in order to obtain more precise estimates of treatment effects. However, there is the danger that model misspecification will lead to bias. We show that certain easy to compute, model-based estimators are asymptotically unbiased even when the working model used is arbitrarily misspecified. Furthermore, these estimators are locally efficient. As a special case of our main result, we consider a simple Poisson working model containing only main terms; in this case, we prove the maximum likelihood estimate of the coefficient corresponding to the treatment variable is an asymptotically unbiased estimator of the marginal log rate ratio, even when the working model is arbitrarily misspecified. This is the log-linear analog of ANCOVA for linear models. Our results demonstrate one application of targeted maximum likelihood estimation. PMID:20628636
Application of Multi-task Lasso Regression in the Parametrization of Stellar Spectra

NASA Astrophysics Data System (ADS)

Chang, Li-Na; Zhang, Pei-Ai

2015-07-01

The multi-task learning approaches have attracted the increasing attention in the fields of machine learning, computer vision, and artificial intelligence. By utilizing the correlations in tasks, learning multiple related tasks simultaneously is better than learning each task independently. An efficient multi-task Lasso (Least Absolute Shrinkage Selection and Operator) regression algorithm is proposed in this paper to estimate the physical parameters of stellar spectra. It not only can obtain the information about the common features of the different physical parameters, but also can preserve effectively their own peculiar features. Experiments were done based on the ELODIE synthetic spectral data simulated with the stellar atmospheric model, and on the SDSS data released by the American large-scale survey Sloan. The estimation precision of our model is better than those of the methods in the related literature, especially for the estimates of the gravitational acceleration (lg g) and the chemical abundance ([Fe/H]). In the experiments we changed the spectral resolution, and applied the noises with different signal-to-noise ratios (SNRs) to the spectral data, so as to illustrate the stability of the model. The results show that the model is influenced by both the resolution and the noise. But the influence of the noise is larger than that of the resolution. In general, the multi-task Lasso regression algorithm is easy to operate, it has a strong stability, and can also improve the overall prediction accuracy of the model.
Bivariate least squares linear regression: Towards a unified analytic formalism. I. Functional models

NASA Astrophysics Data System (ADS)

Caimmi, R.

2011-08-01

Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both heteroscedastic and homoscedastic data. Conversely, samples related to different methods produce discrepant results, due to the presence of (still undetected) systematic errors, which implies no definitive statement can be made at present. A comparison is also made between different expressions of regression line slope and intercept variance estimators, where fractional discrepancies are found to be not exceeding a few percent, which grows up to about 20% in the presence of large dispersion data. An extension of the formalism to structural models is left to a forthcoming paper.
Reliability of the Load-Velocity Relationship Obtained Through Linear and Polynomial Regression Models to Predict the One-Repetition Maximum Load.

PubMed

Pestaña-Melero, Francisco Luis; Haff, G Gregory; Rojas, Francisco Javier; Pérez-Castilla, Alejandro; García-Ramos, Amador

2017-12-18

This study aimed to compare the between-session reliability of the load-velocity relationship between (1) linear vs. polynomial regression models, (2) concentric-only vs. eccentric-concentric bench press variants, as well as (3) the within-participants vs. the between-participants variability of the velocity attained at each percentage of the one-repetition maximum (%1RM). The load-velocity relationship of 30 men (age: 21.2±3.8 y; height: 1.78±0.07 m, body mass: 72.3±7.3 kg; bench press 1RM: 78.8±13.2 kg) were evaluated by means of linear and polynomial regression models in the concentric-only and eccentric-concentric bench press variants in a Smith Machine. Two sessions were performed with each bench press variant. The main findings were: (1) first-order-polynomials (CV: 4.39%-4.70%) provided the load-velocity relationship with higher reliability than second-order-polynomials (CV: 4.68%-5.04%); (2) the reliability of the load-velocity relationship did not differ between the concentric-only and eccentric-concentric bench press variants; (3) the within-participants variability of the velocity attained at each %1RM was markedly lower than the between-participants variability. Taken together, these results highlight that, regardless of the bench press variant considered, the individual determination of the load-velocity relationship by a linear regression model could be recommended to monitor and prescribe the relative load in the Smith machine bench press exercise.
Multivariate functional response regression, with application to fluorescence spectroscopy in a cervical pre-cancer study.

PubMed

Zhu, Hongxiao; Morris, Jeffrey S; Wei, Fengrong; Cox, Dennis D

2017-07-01

Many scientific studies measure different types of high-dimensional signals or images from the same subject, producing multivariate functional data. These functional measurements carry different types of information about the scientific process, and a joint analysis that integrates information across them may provide new insights into the underlying mechanism for the phenomenon under study. Motivated by fluorescence spectroscopy data in a cervical pre-cancer study, a multivariate functional response regression model is proposed, which treats multivariate functional observations as responses and a common set of covariates as predictors. This novel modeling framework simultaneously accounts for correlations between functional variables and potential multi-level structures in data that are induced by experimental design. The model is fitted by performing a two-stage linear transformation-a basis expansion to each functional variable followed by principal component analysis for the concatenated basis coefficients. This transformation effectively reduces the intra-and inter-function correlations and facilitates fast and convenient calculation. A fully Bayesian approach is adopted to sample the model parameters in the transformed space, and posterior inference is performed after inverse-transforming the regression coefficients back to the original data domain. The proposed approach produces functional tests that flag local regions on the functional effects, while controlling the overall experiment-wise error rate or false discovery rate. It also enables functional discriminant analysis through posterior predictive calculation. Analysis of the fluorescence spectroscopy data reveals local regions with differential expressions across the pre-cancer and normal samples. These regions may serve as biomarkers for prognosis and disease assessment.
Evaluation strategies for isotope ratio measurements of single particles by LA-MC-ICPMS.

PubMed

Kappel, S; Boulyga, S F; Dorta, L; Günther, D; Hattendorf, B; Koffler, D; Laaha, G; Leisch, F; Prohaska, T

2013-03-01

Data evaluation is a crucial step when it comes to the determination of accurate and precise isotope ratios computed from transient signals measured by multi-collector-inductively coupled plasma mass spectrometry (MC-ICPMS) coupled to, for example, laser ablation (LA). In the present study, the applicability of different data evaluation strategies (i.e. 'point-by-point', 'integration' and 'linear regression slope' method) for the computation of (235)U/(238)U isotope ratios measured in single particles by LA-MC-ICPMS was investigated. The analyzed uranium oxide particles (i.e. 9073-01-B, CRM U010 and NUSIMEP-7 test samples), having sizes down to the sub-micrometre range, are certified with respect to their (235)U/(238)U isotopic signature, which enabled evaluation of the applied strategies with respect to precision and accuracy. The different strategies were also compared with respect to their expanded uncertainties. Even though the 'point-by-point' method proved to be superior, the other methods are advantageous, as they take weighted signal intensities into account. For the first time, the use of a 'finite mixture model' is presented for the determination of an unknown number of different U isotopic compositions of single particles present on the same planchet. The model uses an algorithm that determines the number of isotopic signatures by attributing individual data points to computed clusters. The (235)U/(238)U isotope ratios are then determined by means of the slopes of linear regressions estimated for each cluster. The model was successfully applied for the accurate determination of different (235)U/(238)U isotope ratios of particles deposited on the NUSIMEP-7 test samples.

Temporal Synchronization Analysis for Improving Regression Modeling of Fecal Indicator Bacteria Levels

EPA Science Inventory

Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
Consensus for linear multi-agent system with intermittent information transmissions using the time-scale theory

NASA Astrophysics Data System (ADS)

Taousser, Fatima; Defoort, Michael; Djemai, Mohamed

2016-01-01

This paper investigates the consensus problem for linear multi-agent system with fixed communication topology in the presence of intermittent communication using the time-scale theory. Since each agent can only obtain relative local information intermittently, the proposed consensus algorithm is based on a discontinuous local interaction rule. The interaction among agents happens at a disjoint set of continuous-time intervals. The closed-loop multi-agent system can be represented using mixed linear continuous-time and linear discrete-time models due to intermittent information transmissions. The time-scale theory provides a powerful tool to combine continuous-time and discrete-time cases and study the consensus protocol under a unified framework. Using this theory, some conditions are derived to achieve exponential consensus under intermittent information transmissions. Simulations are performed to validate the theoretical results.
Influence of Teachers' Personal Health Behaviors on Operationalizing Obesity Prevention Policy in Head Start Preschools: A Project of the Children's Healthy Living Program (CHL).

PubMed

Esquivel, Monica Kazlausky; Nigg, Claudio R; Fialkowski, Marie K; Braun, Kathryn L; Li, Fenfang; Novotny, Rachel

2016-05-01

To quantify the Head Start (HS) teacher mediating and moderating influence on the effect of a wellness policy intervention. Intervention trial within a larger randomized community trial. HS preschools in Hawaii. Twenty-three HS classrooms located within 2 previously randomized communities. Seven-month multi-component intervention with policy changes to food served and service style, initiatives for employee wellness, classroom activities for preschoolers promoting physical activity (PA) and healthy eating, and training and technical assistance. The Environment and Policy Assessment and Observation (EPAO) classroom scores and teacher questionnaires assessing on knowledge, beliefs, priorities, and misconceptions around child nutrition and changes in personal health behaviors and status were the main outcome measures. Paired t tests and linear regression analysis tested the intervention effects on the classroom and mediating and moderating effects of the teacher variables on the classroom environment. General linear model test showed greater intervention effect on the EPAO score where teachers reported higher than average improvements in their own health status and behaviors (estimate [SE] = -2.47 (0.78), P < .05). Strategies to improve teacher health status and behaviors included in a multi-component policy intervention aimed at child obesity prevention may produce a greater effect on classroom environments. Copyright © 2016 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
Multi-timescale sediment responses across a human impacted river-estuary system

NASA Astrophysics Data System (ADS)

Chen, Yining; Chen, Nengwang; Li, Yan; Hong, Huasheng

2018-05-01

Hydrological processes regulating sediment transport from land to sea have been widely studied. However, anthropogenic factors controlling the river flow-sediment regime and subsequent response of the estuary are still poorly understood. Here we conducted a multi-timescale analysis on flow and sediment discharges during the period 1967-2014 for the two tributaries of the Jiulong River in Southeast China. The long-term flow-sediment relationship remained linear in the North River throughout the period, while the linearity showed a remarkable change after 1995 in the West River, largely due to construction of dams and reservoirs in the upland watershed. Over short timescales, rainstorm events caused the changes of suspended sediment concentration (SSC) in the rivers. Regression analysis using synchronous SSC data in a wet season (2009) revealed a delayed response (average 5 days) of the estuary to river input, and a box-model analysis established a quantitative relationship to further describe the response of the estuary to the river sediment input over multiple timescales. The short-term response is determined by both the vertical SSC-salinity changes and the sediment trapping rate in the estuary. However, over the long term, the reduction of riverine sediment yield increased marine sediments trapped into the estuary. The results of this study indicate that human activities (e.g., dams) have substantially altered sediment delivery patterns and river-estuary interactions at multiple timescales.
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach

NASA Astrophysics Data System (ADS)

Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew

2017-05-01

This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice

ERIC Educational Resources Information Center

Rapp, Kelly E.

2012-01-01

The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational attainment) are…
Synthesis of linear regression coefficients by recovering the within-study covariance matrix from summary statistics.

PubMed

Yoneoka, Daisuke; Henmi, Masayuki

2017-06-01

Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Environmental factors and flow paths related to Escherichia coli concentrations at two beaches on Lake St. Clair, Michigan, 2002–2005

USGS Publications Warehouse

Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.

2008-01-01

Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.
Area under the curve predictions of dalbavancin, a new lipoglycopeptide agent, using the end of intravenous infusion concentration data point by regression analyses such as linear, log-linear and power models.

PubMed

Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally

2018-02-01

1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE < 10.3%. The external data evaluation showed that the models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.
A hierarchical linear model for tree height prediction.

Treesearch

Vicente J. Monleon

2003-01-01

Measuring tree height is a time-consuming process. Often, tree diameter is measured and height is estimated from a published regression model. Trees used to develop these models are clustered into stands, but this structure is ignored and independence is assumed. In this study, hierarchical linear models that account explicitly for the clustered structure of the data...
Do factors related to combustion-based sources explain ...

EPA Pesticide Factsheets

Introduction: Spatial heterogeneity of effect estimates in associations between PM2.5 and total non-accidental mortality (TNA) in the United States (US), is an issue in epidemiology. This study uses rate ratios generated from the Multi-City/Multi-Pollutant study (1999-2005) for 313 core-based statistical areas (CBSA) and their metropolitan divisions (MD) to examine combustion-based sources of heterogeneity.Methods: For CBSA/MDs, area-specific log rate ratios (betas) were derived from a model adjusting for time, an interaction with age-group, day of week, and natural splines of current temperature, current dew point, and unconstrained temperature at lags 1, 2, and 3. We assessed the heterogeneity in the betas by linear regression with inverse variance weights, using average NO2, SO2, and CO, which may act as a combustion source proxy, and these pollutants’ correlations with PM2.5. Results: We found that weighted mean PM2.5 association (0.96 percent increase in total non-accidental mortality for a 10 µg/m3 increment in PM2.5) increased by 0.26 (95% confidence interval 0.08 , 0.44) for an interquartile change (0.2) in the correlation of SO2 and PM2.5., but betas showed less dependence on the annual averages of SO2 or NO2. Spline analyses suggest departures from linearity, particularly in a model that examined correlations between PM2.5 and CO.Conclusions: We conclude that correlations between SO2 and PM2.5 as an indicator of combustion sources explains some hete
Building a new predictor for multiple linear regression technique-based corrective maintenance turnaround time.

PubMed

Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa

2008-01-01

This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
Forcing Regression through a Given Point Using Any Familiar Computational Routine.

DTIC Science & Technology

1983-03-01

a linear model , Y =a + OX + e ( Model I) then adopt the principle of least squares; and use sample data to estimate the unknown parameters, a and 8...has an expected value of zero indicates that the "average" response is considered linear . If c varies widely, Model I, though conceptually correct, may...relationship is linear from the maximum observed x to x - a, then Model II should be used. To pro- ceed with the customary evaluation of Model I would be
Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.

PubMed

Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu

2016-08-01

This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Orthogonal Projection in Teaching Regression and Financial Mathematics

ERIC Educational Resources Information Center

Kachapova, Farida; Kachapov, Ilias

2010-01-01

Two improvements in teaching linear regression are suggested. The first is to include the population regression model at the beginning of the topic. The second is to use a geometric approach: to interpret the regression estimate as an orthogonal projection and the estimation error as the distance (which is minimized by the projection). Linear…
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.

PubMed

Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard

2017-04-01

To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.
Predicting musically induced emotions from physiological inputs: linear and neural network models.

PubMed

Russo, Frank A; Vempala, Naresh N; Sandstrom, Gillian M

2013-01-01

Listening to music often leads to physiological responses. Do these physiological responses contain sufficient information to infer emotion induced in the listener? The current study explores this question by attempting to predict judgments of "felt" emotion from physiological responses alone using linear and neural network models. We measured five channels of peripheral physiology from 20 participants-heart rate (HR), respiration, galvanic skin response, and activity in corrugator supercilii and zygomaticus major facial muscles. Using valence and arousal (VA) dimensions, participants rated their felt emotion after listening to each of 12 classical music excerpts. After extracting features from the five channels, we examined their correlation with VA ratings, and then performed multiple linear regression to see if a linear relationship between the physiological responses could account for the ratings. Although linear models predicted a significant amount of variance in arousal ratings, they were unable to do so with valence ratings. We then used a neural network to provide a non-linear account of the ratings. The network was trained on the mean ratings of eight of the 12 excerpts and tested on the remainder. Performance of the neural network confirms that physiological responses alone can be used to predict musically induced emotion. The non-linear model derived from the neural network was more accurate than linear models derived from multiple linear regression, particularly along the valence dimension. A secondary analysis allowed us to quantify the relative contributions of inputs to the non-linear model. The study represents a novel approach to understanding the complex relationship between physiological responses and musically induced emotion.
A Linear Regression Model Identifying the Primary Factors Contributing to Maintenance Man Hours for the C-17 Globemaster III in the Air National Guard

DTIC Science & Technology

2012-06-15

Maintenance AFSCs ................................................................................................. 14 2. Variation Inflation Factors...total variability in the data. It is an indication of how much of the 20 variation in the data can be accounted for in the regression model. In... Variation Inflation Factors for each independent variable (predictor) as regressed against all of the other independent variables in the model. The
Quantile regression models of animal habitat relationships

USGS Publications Warehouse

Cade, Brian S.

2003-01-01

Typically, all factors that limit an organism are not measured and included in statistical models used to investigate relationships with their environment. If important unmeasured variables interact multiplicatively with the measured variables, the statistical models often will have heterogeneous response distributions with unequal variances. Quantile regression is an approach for estimating the conditional quantiles of a response variable distribution in the linear model, providing a more complete view of possible causal relationships between variables in ecological processes. Chapter 1 introduces quantile regression and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of estimates for homogeneous and heterogeneous regression models. Chapter 2 evaluates performance of quantile rankscore tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). A permutation F test maintained better Type I errors than the Chi-square T test for models with smaller n, greater number of parameters p, and more extreme quantiles τ. Both versions of the test required weighting to maintain correct Type I errors when there was heterogeneity under the alternative model. An example application related trout densities to stream channel width:depth. Chapter 3 evaluates a drop in dispersion, F-ratio like permutation test for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). Chapter 4 simulates from a large (N = 10,000) finite population representing grid areas on a landscape to demonstrate various forms of hidden bias that might occur when the effect of a measured habitat variable on some animal was confounded with the effect of another unmeasured variable (spatially and not spatially structured). Depending on whether interactions of the measured habitat and unmeasured variable were negative (interference interactions) or positive (facilitation interactions), either upper (τ > 0.5) or lower (τ < 0.5) quantile regression parameters were less biased than mean rate parameters. Sampling (n = 20 - 300) simulations demonstrated that confidence intervals constructed by inverting rankscore tests provided valid coverage of these biased parameters. Quantile regression was used to estimate effects of physical habitat resources on a bivalve mussel (Macomona liliana) in a New Zealand harbor by modeling the spatial trend surface as a cubic polynomial of location coordinates.
Growth and yield in Eucalyptus globulus

Treesearch

James A. Rinehart; Richard B. Standiford

1983-01-01

A study of the major Eucalyptus globulus stands throughout California conducted by Woodbridge Metcalf in 1924 provides a complete and accurate data set for generating variable site-density yield models. Two models were developed using linear regression techniques. Model I depicts a linear relationship between age and yield best used for stands between five and fifteen...

Linear regression models for solvent accessibility prediction in proteins.

PubMed

Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław

2005-04-01

The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
Multiple-trait random regression models for the estimation of genetic parameters for milk, fat, and protein yield in buffaloes.

PubMed

Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto

2013-09-01

In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Study on power grid characteristics in summer based on Linear regression analysis

NASA Astrophysics Data System (ADS)

Tang, Jin-hui; Liu, You-fei; Liu, Juan; Liu, Qiang; Liu, Zhuan; Xu, Xi

2018-05-01

The correlation analysis of power load and temperature is the precondition and foundation for accurate load prediction, and a great deal of research has been made. This paper constructed the linear correlation model between temperature and power load, then the correlation of fault maintenance work orders with the power load is researched. Data details of Jiangxi province in 2017 summer such as temperature, power load, fault maintenance work orders were adopted in this paper to develop data analysis and mining. Linear regression models established in this paper will promote electricity load growth forecast, fault repair work order review, distribution network operation weakness analysis and other work to further deepen the refinement.
Modelling of binary logistic regression for obesity among secondary students in a rural area of Kedah

NASA Astrophysics Data System (ADS)

Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

2014-07-01

Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Efficient least angle regression for identification of linear-in-the-parameters models

PubMed Central

Beach, Thomas H.; Rezgui, Yacine

2017-01-01

Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140
Linear and evolutionary polynomial regression models to forecast coastal dynamics: Comparison and reliability assessment

NASA Astrophysics Data System (ADS)

Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe

2018-01-01

In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.
Rapid Multi-Tracer PET Tumor Imaging With F-FDG and Secondary Shorter-Lived Tracers.

PubMed

Black, Noel F; McJames, Scott; Kadrmas, Dan J

2009-10-01

Rapid multi-tracer PET, where two to three PET tracers are rapidly scanned with staggered injections, can recover certain imaging measures for each tracer based on differences in tracer kinetics and decay. We previously showed that single-tracer imaging measures can be recovered to a certain extent from rapid dual-tracer (62)Cu - PTSM (blood flow) + (62)Cu - ATSM (hypoxia) tumor imaging. In this work, the feasibility of rapidly imaging (18)F-FDG plus one or two of these shorter-lived secondary tracers was evaluated in the same tumor model. Dynamic PET imaging was performed in four dogs with pre-existing tumors, and the raw scan data was combined to emulate 60 minute long dual- and triple-tracer scans, using the single-tracer scans as gold standards. The multi-tracer data were processed for static (SUV) and kinetic (K(1), K(net)) endpoints for each tracer, followed by linear regression analysis of multi-tracer versus single-tracer results. Static and quantitative dynamic imaging measures of FDG were both accurately recovered from the multi-tracer scans, closely matching the single-tracer FDG standards (R > 0.99). Quantitative blood flow information, as measured by PTSM K(1) and SUV, was also accurately recovered from the multi-tracer scans (R = 0.97). Recovery of ATSM kinetic parameters proved more difficult, though the ATSM SUV was reasonably well recovered (R = 0.92). We conclude that certain additional information from one to two shorter-lived PET tracers may be measured in a rapid multi-tracer scan alongside FDG without compromising the assessment of glucose metabolism. Such additional and complementary information has the potential to improve tumor characterization in vivo, warranting further investigation of rapid multi-tracer techniques.
Rapid Multi-Tracer PET Tumor Imaging With 18F-FDG and Secondary Shorter-Lived Tracers

PubMed Central

Black, Noel F.; McJames, Scott; Kadrmas, Dan J.

2009-01-01

Rapid multi-tracer PET, where two to three PET tracers are rapidly scanned with staggered injections, can recover certain imaging measures for each tracer based on differences in tracer kinetics and decay. We previously showed that single-tracer imaging measures can be recovered to a certain extent from rapid dual-tracer 62Cu – PTSM (blood flow) + 62Cu — ATSM (hypoxia) tumor imaging. In this work, the feasibility of rapidly imaging 18F-FDG plus one or two of these shorter-lived secondary tracers was evaluated in the same tumor model. Dynamic PET imaging was performed in four dogs with pre-existing tumors, and the raw scan data was combined to emulate 60 minute long dual- and triple-tracer scans, using the single-tracer scans as gold standards. The multi-tracer data were processed for static (SUV) and kinetic (K1, Knet) endpoints for each tracer, followed by linear regression analysis of multi-tracer versus single-tracer results. Static and quantitative dynamic imaging measures of FDG were both accurately recovered from the multi-tracer scans, closely matching the single-tracer FDG standards (R > 0.99). Quantitative blood flow information, as measured by PTSM K1 and SUV, was also accurately recovered from the multi-tracer scans (R = 0.97). Recovery of ATSM kinetic parameters proved more difficult, though the ATSM SUV was reasonably well recovered (R = 0.92). We conclude that certain additional information from one to two shorter-lived PET tracers may be measured in a rapid multi-tracer scan alongside FDG without compromising the assessment of glucose metabolism. Such additional and complementary information has the potential to improve tumor characterization in vivo, warranting further investigation of rapid multi-tracer techniques. PMID:20046800
Linear regression analysis: part 14 of a series on evaluation of scientific publications.

PubMed

Schneider, Astrid; Hommel, Gerhard; Blettner, Maria

2010-11-01

Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.
Modeling of Focused Acoustic Field of a Concave Multi-annular Phased Array Using Spheroidal Beam Equation

NASA Astrophysics Data System (ADS)

Yu, Li-Li; Shou, Wen-De; Hui, Chun

2012-02-01

A theoretical model of focused acoustic field for a multi-annular phased array on concave spherical surface is proposed. In this model, the source boundary conditions of the spheroidal beam equation (SBE) for multi-annular phased elements are studied. Acoustic field calculated by the dynamic focusing model of SBE is compared with numerical results of the O'Neil and Khokhlov—Zabolotskaya—Kuznetsov (KZK) model, respectively. Axial dynamic focusing and the harmonic effects are presented. The results demonstrate that the dynamic focusing model of SBE is good valid for a concave multi-annular phased array with a large aperture angle in the linear or nonlinear field.
Linear Modeling and Evaluation of Controls on Flow Response in Western Post-Fire Watersheds

NASA Astrophysics Data System (ADS)

Saxe, S.; Hogue, T. S.; Hay, L.

2015-12-01

This research investigates the impact of wildfires on watershed flow regimes throughout the western United States, specifically focusing on evaluation of fire events within specified subregions and determination of the impact of climate and geophysical variables in post-fire flow response. Fire events were collected through federal and state-level databases and streamflow data were collected from U.S. Geological Survey stream gages. 263 watersheds were identified with at least 10 years of continuous pre-fire daily streamflow records and 5 years of continuous post-fire daily flow records. For each watershed, percent changes in runoff ratio (RO), annual seven day low-flows (7Q2) and annual seven day high-flows (7Q10) were calculated from pre- to post-fire. Numerous independent variables were identified for each watershed and fire event, including topographic, land cover, climate, burn severity, and soils data. The national watersheds were divided into five regions through K-clustering and a lasso linear regression model, applying the Leave-One-Out calibration method, was calculated for each region. Nash-Sutcliffe Efficiency (NSE) was used to determine the accuracy of the resulting models. The regions encompassing the United States along and west of the Rocky Mountains, excluding the coastal watersheds, produced the most accurate linear models. The Pacific coast region models produced poor and inconsistent results, indicating that the regions need to be further subdivided. Presently, RO and HF response variables appear to be more easily modeled than LF. Results of linear regression modeling showed varying importance of watershed and fire event variables, with conflicting correlation between land cover types and soil types by region. The addition of further independent variables and constriction of current variables based on correlation indicators is ongoing and should allow for more accurate linear regression modeling.
Heteroscedasticity as a Basis of Direction Dependence in Reversible Linear Regression Models.

PubMed

Wiedermann, Wolfgang; Artner, Richard; von Eye, Alexander

2017-01-01

Heteroscedasticity is a well-known issue in linear regression modeling. When heteroscedasticity is observed, researchers are advised to remedy possible model misspecification of the explanatory part of the model (e.g., considering alternative functional forms and/or omitted variables). The present contribution discusses another source of heteroscedasticity in observational data: Directional model misspecifications in the case of nonnormal variables. Directional misspecification refers to situations where alternative models are equally likely to explain the data-generating process (e.g., x → y versus y → x). It is shown that the homoscedasticity assumption is likely to be violated in models that erroneously treat true nonnormal predictors as response variables. Recently, Direction Dependence Analysis (DDA) has been proposed as a framework to empirically evaluate the direction of effects in linear models. The present study links the phenomenon of heteroscedasticity with DDA and describes visual diagnostics and nine homoscedasticity tests that can be used to make decisions concerning the direction of effects in linear models. Results of a Monte Carlo simulation that demonstrate the adequacy of the approach are presented. An empirical example is provided, and applicability of the methodology in cases of violated assumptions is discussed.
Weather Impact on Airport Arrival Meter Fix Throughput

NASA Technical Reports Server (NTRS)

Wang, Yao

2017-01-01

Time-based flow management provides arrival aircraft schedules based on arrival airport conditions, airport capacity, required spacing, and weather conditions. In order to meet a scheduled time at which arrival aircraft can cross an airport arrival meter fix prior to entering the airport terminal airspace, air traffic controllers make regulations on air traffic. Severe weather may create an airport arrival bottleneck if one or more of airport arrival meter fixes are partially or completely blocked by the weather and the arrival demand has not been reduced accordingly. Under these conditions, aircraft are frequently being put in holding patterns until they can be rerouted. A model that predicts the weather impacted meter fix throughput may help air traffic controllers direct arrival flows into the airport more efficiently, minimizing arrival meter fix congestion. This paper presents an analysis of air traffic flows across arrival meter fixes at the Newark Liberty International Airport (EWR). Several scenarios of weather impacted EWR arrival fix flows are described. Furthermore, multiple linear regression and regression tree ensemble learning approaches for translating multiple sector Weather Impacted Traffic Indexes (WITI) to EWR arrival meter fix throughputs are examined. These weather translation models are developed and validated using the EWR arrival flight and weather data for the period of April-September in 2014. This study also compares the performance of the regression tree ensemble with traditional multiple linear regression models for estimating the weather impacted throughputs at each of the EWR arrival meter fixes. For all meter fixes investigated, the results from the regression tree ensemble weather translation models show a stronger correlation between model outputs and observed meter fix throughputs than that produced from multiple linear regression method.
[Use of multiple regression models in observational studies (1970-2013) and requirements of the STROBE guidelines in Spanish scientific journals].

PubMed

Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M

In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.
An Alternative Method for Allocating Base Maintenance Supplies to Mission, Design, and Series Aircraft in the United States Air Force.

DTIC Science & Technology

1987-09-01

Edition,. Fail 1986. 33. Neter, John et al. Applied Linear Regression MoceL. Homewood IL: Richard D. Irwin, Incorporated, iJ83. 34. NovicK, David... Linear Regression Models (33) then, for each sample observation (X fh, the method of least squares considers the deviation of Yubms from its expected value...for finding good estimators of b - b5 * In -2raer to explain the procedure, the model Yubms = b0 + b!xfh will be discussed. According to Applied
A method for nonlinear exponential regression analysis

NASA Technical Reports Server (NTRS)

Junkin, B. G.

1971-01-01

A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Earthquakes Magnitude Predication Using Artificial Neural Network in Northern Red Sea Area

NASA Astrophysics Data System (ADS)

Alarifi, A. S.; Alarifi, N. S.

2009-12-01

Earthquakes are natural hazards that do not happen very often, however they may cause huge losses in life and property. Early preparation for these hazards is a key factor to reduce their damage and consequence. Since early ages, people tried to predicate earthquakes using simple observations such as strange or a typical animal behavior. In this paper, we study data collected from existing earthquake catalogue to give better forecasting for future earthquakes. The 16000 events cover a time span of 1970 to 2009, the magnitude range from greater than 0 to less than 7.2 while the depth range from greater than 0 to less than 100km. We propose a new artificial intelligent predication system based on artificial neural network, which can be used to predicate the magnitude of future earthquakes in northern Red Sea area including the Sinai Peninsula, the Gulf of Aqaba, and the Gulf of Suez. We propose a feed forward new neural network model with multi-hidden layers to predicate earthquakes occurrences and magnitudes in northern Red Sea area. Although there are similar model that have been published before in different areas, to our best knowledge this is the first neural network model to predicate earthquake in northern Red Sea area. Furthermore, we present other forecasting methods such as moving average over different interval, normally distributed random predicator, and uniformly distributed random predicator. In addition, we present different statistical methods and data fitting such as linear, quadratic, and cubic regression. We present a details performance analyses of the proposed methods for different evaluation metrics. The results show that neural network model provides higher forecast accuracy than other proposed methods. The results show that neural network achieves an average absolute error of 2.6% while an average absolute error of 3.8%, 7.3% and 6.17% for moving average, linear regression and cubic regression, respectively. In this work, we show an analysis of earthquakes data in northern Red Sea area for different statistics parameters such as correlation, mean, standard deviation, and other. This analysis is to provide a deep understand of the Seismicity of the area, and existing patterns.
The role of shoe design on the prediction of free torque at the shoe-surface interface using pressure insole technology.

PubMed

Weaver, Brian Thomas; Fitzsimons, Kathleen; Braman, Jerrod; Haut, Roger

2016-09-01

The goal of the current study was to expand on previous work to validate the use of pressure insole technology in conjunction with linear regression models to predict the free torque at the shoe-surface interface that is generated while wearing different athletic shoes. Three distinctly different shoe designs were utilised. The stiffness of each shoe was determined with a material's testing machine. Six participants wore each shoe that was fitted with an insole pressure measurement device and performed rotation trials on an embedded force plate. A pressure sensor mask was constructed from those sensors having a high linear correlation with free torque values. Linear regression models were developed to predict free torques from these pressure sensor data. The models were able to accurately predict their own free torque well (RMS error 3.72 ± 0.74 Nm), but not that of the other shoes (RMS error 10.43 ± 3.79 Nm). Models performing self-prediction were also able to measure differences in shoe stiffness. The results of the current study showed the need for participant-shoe specific linear regression models to insure high prediction accuracy of free torques from pressure sensor data during isolated internal and external rotations of the body with respect to a planted foot.
Monopole and dipole estimation for multi-frequency sky maps by linear regression

NASA Astrophysics Data System (ADS)

Wehus, I. K.; Fuskeland, U.; Eriksen, H. K.; Banday, A. J.; Dickinson, C.; Ghosh, T.; Górski, K. M.; Lawrence, C. R.; Leahy, J. P.; Maino, D.; Reich, P.; Reich, W.

2017-01-01

We describe a simple but efficient method for deriving a consistent set of monopole and dipole corrections for multi-frequency sky map data sets, allowing robust parametric component separation with the same data set. The computational core of this method is linear regression between pairs of frequency maps, often called T-T plots. Individual contributions from monopole and dipole terms are determined by performing the regression locally in patches on the sky, while the degeneracy between different frequencies is lifted whenever the dominant foreground component exhibits a significant spatial spectral index variation. Based on this method, we present two different, but each internally consistent, sets of monopole and dipole coefficients for the nine-year WMAP, Planck 2013, SFD 100 μm, Haslam 408 MHz and Reich & Reich 1420 MHz maps. The two sets have been derived with different analysis assumptions and data selection, and provide an estimate of residual systematic uncertainties. In general, our values are in good agreement with previously published results. Among the most notable results are a relative dipole between the WMAP and Planck experiments of 10-15μK (depending on frequency), an estimate of the 408 MHz map monopole of 8.9 ± 1.3 K, and a non-zero dipole in the 1420 MHz map of 0.15 ± 0.03 K pointing towards Galactic coordinates (l,b) = (308°,-36°) ± 14°. These values represent the sum of any instrumental and data processing offsets, as well as any Galactic or extra-Galactic component that is spectrally uniform over the full sky.
Independent contrasts and PGLS regression estimators are equivalent.

PubMed

Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary

2012-05-01

We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.

A Novel Degradation Identification Method for Wind Turbine Pitch System

NASA Astrophysics Data System (ADS)

Guo, Hui-Dong

2018-04-01

It’s difficult for traditional threshold value method to identify degradation of operating equipment accurately. An novel degradation evaluation method suitable for wind turbine condition maintenance strategy implementation was proposed in this paper. Based on the analysis of typical variable-speed pitch-to-feather control principle and monitoring parameters for pitch system, a multi input multi output (MIMO) regression model was applied to pitch system, where wind speed, power generation regarding as input parameters, wheel rotation speed, pitch angle and motor driving currency for three blades as output parameters. Then, the difference between the on-line measurement and the calculated value from the MIMO regression model applying least square support vector machines (LSSVM) method was defined as the Observed Vector of the system. The Gaussian mixture model (GMM) was applied to fitting the distribution of the multi dimension Observed Vectors. Applying the model established, the Degradation Index was calculated using the SCADA data of a wind turbine damaged its pitch bearing retainer and rolling body, which illustrated the feasibility of the provided method.
Genotype-phenotype association study via new multi-task learning model

PubMed Central

Huo, Zhouyuan; Shen, Dinggang

2018-01-01

Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2,1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2,1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs. PMID:29218896
Efficient design of gain-flattened multi-pump Raman fiber amplifiers using least squares support vector regression

NASA Astrophysics Data System (ADS)

Chen, Jing; Qiu, Xiaojie; Yin, Cunyi; Jiang, Hao

2018-02-01

An efficient method to design the broadband gain-flattened Raman fiber amplifier with multiple pumps is proposed based on least squares support vector regression (LS-SVR). A multi-input multi-output LS-SVR model is introduced to replace the complicated solving process of the nonlinear coupled Raman amplification equation. The proposed approach contains two stages: offline training stage and online optimization stage. During the offline stage, the LS-SVR model is trained. Owing to the good generalization capability of LS-SVR, the net gain spectrum can be directly and accurately obtained when inputting any combination of the pump wavelength and power to the well-trained model. During the online stage, we incorporate the LS-SVR model into the particle swarm optimization algorithm to find the optimal pump configuration. The design results demonstrate that the proposed method greatly shortens the computation time and enhances the efficiency of the pump parameter optimization for Raman fiber amplifier design.
Multi-center prediction of hemorrhagic transformation in acute ischemic stroke using permeability imaging features.

PubMed

Scalzo, Fabien; Alger, Jeffry R; Hu, Xiao; Saver, Jeffrey L; Dani, Krishna A; Muir, Keith W; Demchuk, Andrew M; Coutts, Shelagh B; Luby, Marie; Warach, Steven; Liebeskind, David S

2013-07-01

Permeability images derived from magnetic resonance (MR) perfusion images are sensitive to blood-brain barrier derangement of the brain tissue and have been shown to correlate with subsequent development of hemorrhagic transformation (HT) in acute ischemic stroke. This paper presents a multi-center retrospective study that evaluates the predictive power in terms of HT of six permeability MRI measures including contrast slope (CS), final contrast (FC), maximum peak bolus concentration (MPB), peak bolus area (PB), relative recirculation (rR), and percentage recovery (%R). Dynamic T2*-weighted perfusion MR images were collected from 263 acute ischemic stroke patients from four medical centers. An essential aspect of this study is to exploit a classifier-based framework to automatically identify predictive patterns in the overall intensity distribution of the permeability maps. The model is based on normalized intensity histograms that are used as input features to the predictive model. Linear and nonlinear predictive models are evaluated using a cross-validation to measure generalization power on new patients and a comparative analysis is provided for the different types of parameters. Results demonstrate that perfusion imaging in acute ischemic stroke can predict HT with an average accuracy of more than 85% using a predictive model based on a nonlinear regression model. Results also indicate that the permeability feature based on the percentage of recovery performs significantly better than the other features. This novel model may be used to refine treatment decisions in acute stroke. Copyright © 2013 Elsevier Inc. All rights reserved.
Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

PubMed

Henrard, S; Speybroeck, N; Hermans, C

2015-11-01

Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Pragmatic estimation of a spatio-temporal air quality model with irregular monitoring data

NASA Astrophysics Data System (ADS)

Sampson, Paul D.; Szpiro, Adam A.; Sheppard, Lianne; Lindström, Johan; Kaufman, Joel D.

2011-11-01

Statistical analyses of health effects of air pollution have increasingly used GIS-based covariates for prediction of ambient air quality in "land use" regression models. More recently these spatial regression models have accounted for spatial correlation structure in combining monitoring data with land use covariates. We present a flexible spatio-temporal modeling framework and pragmatic, multi-step estimation procedure that accommodates essentially arbitrary patterns of missing data with respect to an ideally complete space by time matrix of observations on a network of monitoring sites. The methodology incorporates a model for smooth temporal trends with coefficients varying in space according to Partial Least Squares regressions on a large set of geographic covariates and nonstationary modeling of spatio-temporal residuals from these regressions. This work was developed to provide spatial point predictions of PM 2.5 concentrations for the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) using irregular monitoring data derived from the AQS regulatory monitoring network and supplemental short-time scale monitoring campaigns conducted to better predict intra-urban variation in air quality. We demonstrate the interpretation and accuracy of this methodology in modeling data from 2000 through 2006 in six U.S. metropolitan areas and establish a basis for likelihood-based estimation.
Functional Relationships and Regression Analysis.

ERIC Educational Resources Information Center

Preece, Peter F. W.

1978-01-01

Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
Suppression Situations in Multiple Linear Regression

ERIC Educational Resources Information Center

Shieh, Gwowen

2006-01-01

This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…
A comparative study between nonlinear regression and nonparametric approaches for modelling Phalaris paradoxa seedling emergence

USDA-ARS?s Scientific Manuscript database

Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...
Association between central sleep apnea and left ventricular structure: the Multi-Ethnic Study of Atherosclerosis.

PubMed

Javaheri, Sogol; Sharma, Ravi K; Bluemke, David A; Redline, Susan

2017-08-01

We assessed whether the presence of central sleep apnea is associated with adverse left ventricular structural changes. We analysed 1412 participants from the Multi-Ethnic Study of Atherosclerosis who underwent both overnight polysomnography and cardiac magnetic resonance imaging. Subjects had been recruited 10 years earlier when free of cardiovascular disease. Our main exposure is the presence of central sleep apnea as defined by central apnea-hypopnea index = 5 or the presence of Cheyne-Stokes breathing. Outcome variables were left ventricular mass/height, left ventricular ejection fraction, and left ventricular mass/volume ratio. Multivariate linear regression models adjusted for age, gender, race, waist circumference, tobacco use, hypertension, and the obstructive apnea-hypopnea index were fit for the outcomes. Of the 1412 participants, 27 (2%) individuals had central sleep apnea. After adjusting for covariates, the presence of central sleep apnea was significantly associated with elevated left ventricular mass/volume ratio (β = 0.11 ± 0.04 g mL -1 , P = 0.0071), an adverse cardiac finding signifying concentric remodelling. © 2017 European Sleep Research Society.
The Use of Shrinkage Techniques in the Estimation of Attrition Rates for Large Scale Manpower Models

DTIC Science & Technology

1988-07-27

auto regressive model combined with a linear program that solves for the coefficients using MAD. But this success has diminished with time (Rowe...8217Harrison-Stevens Forcasting and the Multiprocess Dy- namic Linear Model ", The American Statistician, v.40, pp. 12 9 - 1 3 5 . 1986. 8. Box, G. E. P. and...1950. 40. McCullagh, P. and Nelder, J., Generalized Linear Models , Chapman and Hall. 1983. 41. McKenzie, E. General Exponential Smoothing and the
A new multiple regression model to identify multi-family houses with a high prevalence of sick building symptoms "SBS", within the healthy sustainable house study in Stockholm (3H).

PubMed

Engvall, Karin; Hult, M; Corner, R; Lampa, E; Norbäck, D; Emenius, G

2010-01-01

The aim was to develop a new model to identify residential buildings with higher frequencies of "SBS" than expected, "risk buildings". In 2005, 481 multi-family buildings with 10,506 dwellings in Stockholm were studied by a new stratified random sampling. A standardised self-administered questionnaire was used to assess "SBS", atopy and personal factors. The response rate was 73%. Statistical analysis was performed by multiple logistic regressions. Dwellers owning their building reported less "SBS" than those renting. There was a strong relationship between socio-economic factors and ownership. The regression model, ended up with high explanatory values for age, gender, atopy and ownership. Applying our model, 9% of all residential buildings in Stockholm were classified as "risk buildings" with the highest proportion in houses built 1961-1975 (26%) and lowest in houses built 1985-1990 (4%). To identify "risk buildings", it is necessary to adjust for ownership and population characteristics.
An alternative method for quantifying coronary artery calcification: the multi-ethnic study of atherosclerosis (MESA).

PubMed

Liang, C Jason; Budoff, Matthew J; Kaufman, Joel D; Kronmal, Richard A; Brown, Elizabeth R

2012-07-02

Extent of atherosclerosis measured by amount of coronary artery calcium (CAC) in computed tomography (CT) has been traditionally assessed using thresholded scoring methods, such as the Agatston score (AS). These thresholded scores have value in clinical prediction, but important information might exist below the threshold, which would have important advantages for understanding genetic, environmental, and other risk factors in atherosclerosis. We developed a semi-automated threshold-free scoring method, the spatially weighted calcium score (SWCS) for CAC in the Multi-Ethnic Study of Atherosclerosis (MESA). Chest CT scans were obtained from 6814 participants in the Multi-Ethnic Study of Atherosclerosis (MESA). The SWCS and the AS were calculated for each of the scans. Cox proportional hazards models and linear regression models were used to evaluate the associations of the scores with CHD events and CHD risk factors. CHD risk factors were summarized using a linear predictor. Among all participants and participants with AS > 0, the SWCS and AS both showed similar strongly significant associations with CHD events (hazard ratios, 1.23 and 1.19 per doubling of SWCS and AS; 95% CI, 1.16 to 1.30 and 1.14 to 1.26) and CHD risk factors (slopes, 0.178 and 0.164; 95% CI, 0.162 to 0.195 and 0.149 to 0.179). Even among participants with AS = 0, an increase in the SWCS was still significantly associated with established CHD risk factors (slope, 0.181; 95% CI, 0.138 to 0.224). The SWCS appeared to be predictive of CHD events even in participants with AS = 0, though those events were rare as expected. The SWCS provides a valid, continuous measure of CAC suitable for quantifying the extent of atherosclerosis without a threshold, which will be useful for examining novel genetic and environmental risk factors for atherosclerosis.
A novel strategy for forensic age prediction by DNA methylation and support vector regression model

PubMed Central

Xu, Cheng; Qu, Hongzhu; Wang, Guangyu; Xie, Bingbing; Shi, Yi; Yang, Yaran; Zhao, Zhao; Hu, Lan; Fang, Xiangdong; Yan, Jiangwei; Feng, Lei

2015-01-01

High deviations resulting from prediction model, gender and population difference have limited age estimation application of DNA methylation markers. Here we identified 2,957 novel age-associated DNA methylation sites (P < 0.01 and R2 > 0.5) in blood of eight pairs of Chinese Han female monozygotic twins. Among them, nine novel sites (false discovery rate < 0.01), along with three other reported sites, were further validated in 49 unrelated female volunteers with ages of 20–80 years by Sequenom Massarray. A total of 95 CpGs were covered in the PCR products and 11 of them were built the age prediction models. After comparing four different models including, multivariate linear regression, multivariate nonlinear regression, back propagation neural network and support vector regression, SVR was identified as the most robust model with the least mean absolute deviation from real chronological age (2.8 years) and an average accuracy of 4.7 years predicted by only six loci from the 11 loci, as well as an less cross-validated error compared with linear regression model. Our novel strategy provides an accurate measurement that is highly useful in estimating the individual age in forensic practice as well as in tracking the aging process in other related applications. PMID:26635134
An Analysis of Turkey's PISA 2015 Results Using Two-Level Hierarchical Linear Modelling

ERIC Educational Resources Information Center

Atas, Dogu; Karadag, Özge

2017-01-01

In the field of education, most of the data collected are multi-level structured. Cities, city based schools, school based classes and finally students in the classrooms constitute a hierarchical structure. Hierarchical linear models give more accurate results compared to standard models when the data set has a structure going far as individuals,…
Scale of association: hierarchical linear models and the measurement of ecological systems

Treesearch

Sean M. McMahon; Jeffrey M. Diez

2007-01-01

A fundamental challenge to understanding patterns in ecological systems lies in employing methods that can analyse, test and draw inference from measured associations between variables across scales. Hierarchical linear models (HLM) use advanced estimation algorithms to measure regression relationships and variance-covariance parameters in hierarchically structured...
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States

NASA Astrophysics Data System (ADS)

Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin

2016-03-01

From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States

PubMed Central

Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin

2016-01-01

From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states. PMID:26996254
Analysis of Learning Curve Fitting Techniques.

DTIC Science & Technology

1987-09-01

1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied
Voluntary EMG-to-force estimation with a multi-scale physiological muscle model

PubMed Central

2013-01-01

Background EMG-to-force estimation based on muscle models, for voluntary contraction has many applications in human motion analysis. The so-called Hill model is recognized as a standard model for this practical use. However, it is a phenomenological model whereby muscle activation, force-length and force-velocity properties are considered independently. Perreault reported Hill modeling errors were large for different firing frequencies, level of activation and speed of contraction. It may be due to the lack of coupling between activation and force-velocity properties. In this paper, we discuss EMG-force estimation with a multi-scale physiology based model, which has a link to underlying crossbridge dynamics. Differently from the Hill model, the proposed method provides dual dynamics of recruitment and calcium activation. Methods The ankle torque was measured for the plantar flexion along with EMG measurements of the medial gastrocnemius (GAS) and soleus (SOL). In addition to Hill representation of the passive elements, three models of the contractile parts have been compared. Using common EMG signals during isometric contraction in four able-bodied subjects, torque was estimated by the linear Hill model, the nonlinear Hill model and the multi-scale physiological model that refers to Huxley theory. The comparison was made in normalized scale versus the case in maximum voluntary contraction. Results The estimation results obtained with the multi-scale model showed the best performances both in fast-short and slow-long term contraction in randomized tests for all the four subjects. The RMS errors were improved with the nonlinear Hill model compared to linear Hill, however it showed limitations to account for the different speed of contractions. Average error was 16.9% with the linear Hill model, 9.3% with the modified Hill model. In contrast, the error in the multi-scale model was 6.1% while maintaining a uniform estimation performance in both fast and slow contractions schemes. Conclusions We introduced a novel approach that allows EMG-force estimation based on a multi-scale physiology model integrating Hill approach for the passive elements and microscopic cross-bridge representations for the contractile element. The experimental evaluation highlights estimation improvements especially a larger range of contraction conditions with integration of the neural activation frequency property and force-velocity relationship through cross-bridge dynamics consideration. PMID:24007560

Regression Commonality Analysis: A Technique for Quantitative Theory Building

ERIC Educational Resources Information Center

Nimon, Kim; Reio, Thomas G., Jr.

2011-01-01

When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
A Flight Dynamics Model for a Multi-Actuated Flexible Rocket Vehicle

NASA Technical Reports Server (NTRS)

Orr, Jeb S.

2011-01-01

A comprehensive set of motion equations for a multi-actuated flight vehicle is presented. The dynamics are derived from a vector approach that generalizes the classical linear perturbation equations for flexible launch vehicles into a coupled three-dimensional model. The effects of nozzle and aerosurface inertial coupling, sloshing propellant, and elasticity are incorporated without restrictions on the position, orientation, or number of model elements. The present formulation is well suited to matrix implementation for large-scale linear stability and sensitivity analysis and is also shown to be extensible to nonlinear time-domain simulation through the application of a special form of Lagrange s equations in quasi-coordinates. The model is validated through frequency-domain response comparison with a high-fidelity planar implementation.
Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

PubMed

Song, Chao; Kwan, Mei-Po; Zhu, Jiping

2017-04-08

An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression

PubMed Central

Song, Chao; Kwan, Mei-Po; Zhu, Jiping

2017-01-01

An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale. PMID:28397745
Pattern Recognition Analysis of Age-Related Retinal Ganglion Cell Signatures in the Human Eye

PubMed Central

Yoshioka, Nayuta; Zangerl, Barbara; Nivison-Smith, Lisa; Khuu, Sieu K.; Jones, Bryan W.; Pfeiffer, Rebecca L.; Marc, Robert E.; Kalloniatis, Michael

2017-01-01

Purpose To characterize macular ganglion cell layer (GCL) changes with age and provide a framework to assess changes in ocular disease. This study used data clustering to analyze macular GCL patterns from optical coherence tomography (OCT) in a large cohort of subjects without ocular disease. Methods Single eyes of 201 patients evaluated at the Centre for Eye Health (Sydney, Australia) were retrospectively enrolled (age range, 20–85); 8 × 8 grid locations obtained from Spectralis OCT macular scans were analyzed with unsupervised classification into statistically separable classes sharing common GCL thickness and change with age. The resulting classes and gridwise data were fitted with linear and segmented linear regression curves. Additionally, normalized data were analyzed to determine regression as a percentage. Accuracy of each model was examined through comparison of predicted 50-year-old equivalent macular GCL thickness for the entire cohort to a true 50-year-old reference cohort. Results Pattern recognition clustered GCL thickness across the macula into five to eight spatially concentric classes. F-test demonstrated segmented linear regression to be the most appropriate model for macular GCL change. The pattern recognition–derived and normalized model revealed less difference between the predicted macular GCL thickness and the reference cohort (average ± SD 0.19 ± 0.92 and −0.30 ± 0.61 μm) than a gridwise model (average ± SD 0.62 ± 1.43 μm). Conclusions Pattern recognition successfully identified statistically separable macular areas that undergo a segmented linear reduction with age. This regression model better predicted macular GCL thickness. The various unique spatial patterns revealed by pattern recognition combined with core GCL thickness data provide a framework to analyze GCL loss in ocular disease. PMID:28632847
Understanding software faults and their role in software reliability modeling

NASA Technical Reports Server (NTRS)

Munson, John C.

1994-01-01

This study is a direct result of an on-going project to model the reliability of a large real-time control avionics system. In previous modeling efforts with this system, hardware reliability models were applied in modeling the reliability behavior of this system. In an attempt to enhance the performance of the adapted reliability models, certain software attributes were introduced in these models to control for differences between programs and also sequential executions of the same program. As the basic nature of the software attributes that affect software reliability become better understood in the modeling process, this information begins to have important implications on the software development process. A significant problem arises when raw attribute measures are to be used in statistical models as predictors, for example, of measures of software quality. This is because many of the metrics are highly correlated. Consider the two attributes: lines of code, LOC, and number of program statements, Stmts. In this case, it is quite obvious that a program with a high value of LOC probably will also have a relatively high value of Stmts. In the case of low level languages, such as assembly language programs, there might be a one-to-one relationship between the statement count and the lines of code. When there is a complete absence of linear relationship among the metrics, they are said to be orthogonal or uncorrelated. Usually the lack of orthogonality is not serious enough to affect a statistical analysis. However, for the purposes of some statistical analysis such as multiple regression, the software metrics are so strongly interrelated that the regression results may be ambiguous and possibly even misleading. Typically, it is difficult to estimate the unique effects of individual software metrics in the regression equation. The estimated values of the coefficients are very sensitive to slight changes in the data and to the addition or deletion of variables in the regression equation. Since most of the existing metrics have common elements and are linear combinations of these common elements, it seems reasonable to investigate the structure of the underlying common factors or components that make up the raw metrics. The technique we have chosen to use to explore this structure is a procedure called principal components analysis. Principal components analysis is a decomposition technique that may be used to detect and analyze collinearity in software metrics. When confronted with a large number of metrics measuring a single construct, it may be desirable to represent the set by some smaller number of variables that convey all, or most, of the information in the original set. Principal components are linear transformations of a set of random variables that summarize the information contained in the variables. The transformations are chosen so that the first component accounts for the maximal amount of variation of the measures of any possible linear transform; the second component accounts for the maximal amount of residual variation; and so on. The principal components are constructed so that they represent transformed scores on dimensions that are orthogonal. Through the use of principal components analysis, it is possible to have a set of highly related software attributes mapped into a small number of uncorrelated attribute domains. This definitively solves the problem of multi-collinearity in subsequent regression analysis. There are many software metrics in the literature, but principal component analysis reveals that there are few distinct sources of variation, i.e. dimensions, in this set of metrics. It would appear perfectly reasonable to characterize the measurable attributes of a program with a simple function of a small number of orthogonal metrics each of which represents a distinct software attribute domain.
A crystal plasticity model for slip in hexagonal close packed metals based on discrete dislocation simulations

NASA Astrophysics Data System (ADS)

Messner, Mark C.; Rhee, Moono; Arsenlis, Athanasios; Barton, Nathan R.

2017-06-01

This work develops a method for calibrating a crystal plasticity model to the results of discrete dislocation (DD) simulations. The crystal model explicitly represents junction formation and annihilation mechanisms and applies these mechanisms to describe hardening in hexagonal close packed metals. The model treats these dislocation mechanisms separately from elastic interactions among populations of dislocations, which the model represents through a conventional strength-interaction matrix. This split between elastic interactions and junction formation mechanisms more accurately reproduces the DD data and results in a multi-scale model that better represents the lower scale physics. The fitting procedure employs concepts of machine learning—feature selection by regularized regression and cross-validation—to develop a robust, physically accurate crystal model. The work also presents a method for ensuring the final, calibrated crystal model respects the physical symmetries of the crystal system. Calibrating the crystal model requires fitting two linear operators: one describing elastic dislocation interactions and another describing junction formation and annihilation dislocation reactions. The structure of these operators in the final, calibrated model reflect the crystal symmetry and slip system geometry of the DD simulations.
Commentary on the statistical properties of noise and its implication on general linear models in functional near-infrared spectroscopy.

PubMed

Huppert, Theodore J

2016-01-01

Functional near-infrared spectroscopy (fNIRS) is a noninvasive neuroimaging technique that uses low levels of light to measure changes in cerebral blood oxygenation levels. In the majority of NIRS functional brain studies, analysis of this data is based on a statistical comparison of hemodynamic levels between a baseline and task or between multiple task conditions by means of a linear regression model: the so-called general linear model. Although these methods are similar to their implementation in other fields, particularly for functional magnetic resonance imaging, the specific application of these methods in fNIRS research differs in several key ways related to the sources of noise and artifacts unique to fNIRS. In this brief communication, we discuss the application of linear regression models in fNIRS and the modifications needed to generalize these models in order to deal with structured (colored) noise due to systemic physiology and noise heteroscedasticity due to motion artifacts. The objective of this work is to present an overview of these noise properties in the context of the linear model as it applies to fNIRS data. This work is aimed at explaining these mathematical issues to the general fNIRS experimental researcher but is not intended to be a complete mathematical treatment of these concepts.
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data

PubMed Central

Ying, Gui-shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard

2017-01-01

Purpose To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. Methods We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field data in the elderly. Results When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI −0.03 to 0.32D, P=0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, P=0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller P-values, while analysis of the worse eye provided larger P-values than mixed effects models and marginal models. Conclusion In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision. PMID:28102741
Weighted functional linear regression models for gene-based association analysis.

PubMed

Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I

2018-01-01

Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.
Time Series Analysis and Forecasting of Wastewater Inflow into Bandar Tun Razak Sewage Treatment Plant in Selangor, Malaysia

NASA Astrophysics Data System (ADS)

Abunama, Taher; Othman, Faridah

2017-06-01

Analysing the fluctuations of wastewater inflow rates in sewage treatment plants (STPs) is essential to guarantee a sufficient treatment of wastewater before discharging it to the environment. The main objectives of this study are to statistically analyze and forecast the wastewater inflow rates into the Bandar Tun Razak STP in Kuala Lumpur, Malaysia. A time series analysis of three years’ weekly influent data (156weeks) has been conducted using the Auto-Regressive Integrated Moving Average (ARIMA) model. Various combinations of ARIMA orders (p, d, q) have been tried to select the most fitted model, which was utilized to forecast the wastewater inflow rates. The linear regression analysis was applied to testify the correlation between the observed and predicted influents. ARIMA (3, 1, 3) model was selected with the highest significance R-square and lowest normalized Bayesian Information Criterion (BIC) value, and accordingly the wastewater inflow rates were forecasted to additional 52weeks. The linear regression analysis between the observed and predicted values of the wastewater inflow rates showed a positive linear correlation with a coefficient of 0.831.
Method validation using weighted linear regression models for quantification of UV filters in water samples.

PubMed

da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues

2015-01-01

This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil. Copyright © 2014 Elsevier B.V. All rights reserved.
Quantum State Tomography via Linear Regression Estimation

PubMed Central

Qi, Bo; Hou, Zhibo; Li, Li; Dong, Daoyi; Xiang, Guoyong; Guo, Guangcan

2013-01-01

A simple yet efficient state reconstruction algorithm of linear regression estimation (LRE) is presented for quantum state tomography. In this method, quantum state reconstruction is converted into a parameter estimation problem of a linear regression model and the least-squares method is employed to estimate the unknown parameters. An asymptotic mean squared error (MSE) upper bound for all possible states to be estimated is given analytically, which depends explicitly upon the involved measurement bases. This analytical MSE upper bound can guide one to choose optimal measurement sets. The computational complexity of LRE is O(d4) where d is the dimension of the quantum state. Numerical examples show that LRE is much faster than maximum-likelihood estimation for quantum state tomography. PMID:24336519
[Frailty, disability and multi-morbidity: the relationship with quality of life and healthcare costs in elderly people].

PubMed

Lutomski, Jennifer E; Baars, Maria A E; Boter, Han; Buurman, Bianca M; den Elzen, Wendy P J; Jansen, Aaltje P D; Kempen, Gertrudis I J M; Steunenberg, Bas; Steyerberg, Ewout W; Olde Rikkert, Marcel G M; Melis, René J F

2014-01-01

To assess the independent and combined impact of frailty, multi-morbidity, and activities of daily living (ADL) limitations on self-reported quality of life and healthcare costs in elderly people. Cross-sectional, descriptive study. Data came from The Older Persons and Informal Caregivers Minimum DataSet (TOPICS-MDS), a pooled dataset with information from 41 projects across the Netherlands from the Dutch national care for the Elderly programme. Frailty, multi-morbidity and ADL limitations, and the interactions between these domains, were used as predictors in regression analyses with quality of life and healthcare costs as outcome measures. Analyses were stratified by living situation (independent or care home). Directionality and magnitude of associations were assessed using linear mixed models. A total of 11,093 elderly people were interviewed. A substantial proportion of elderly people living independently reported frailty, multi-morbidity, and/or ADL limitations (56.4%, 88.3% and 41.4%, respectively), as did elderly people living in a care home (88.7%, 89.2% and 77,3%, respectively). One-third of elderly people living at home (31.9%) reported all three conditions compared with two-thirds of elderly people living in a care home (68.3%). In the multivariable analysis, frailty had a strong impact on outcomes independently of multi-morbidity and ADL limitations. Elderly people experiencing problems across all three domains reported the poorest quality-of-life scores and the highest healthcare costs, irrespective of their living situation. Frailty, multi-morbidity and ADL limitations are complementary measurements, which together provide a more holistic understanding of health status in elderly people. A multi-dimensional approach is important in mapping the complex relationships between these measurements on the one hand and the quality of life and healthcare costs on the other.
Survival Data and Regression Models

NASA Astrophysics Data System (ADS)

Grégoire, G.

2014-12-01

We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
ℓ(p)-Norm multikernel learning approach for stock market price forecasting.

PubMed

Shao, Xigao; Wu, Kun; Liao, Bifeng

2012-01-01

Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.
The Use of Linear Programming for Prediction.

ERIC Educational Resources Information Center

Schnittjer, Carl J.

The purpose of the study was to develop a linear programming model to be used for prediction, test the accuracy of the predictions, and compare the accuracy with that produced by curvilinear multiple regression analysis. (Author)
Predicting birth weight with conditionally linear transformation models.

PubMed

Möst, Lisa; Schmid, Matthias; Faschingbauer, Florian; Hothorn, Torsten

2016-12-01

Low and high birth weight (BW) are important risk factors for neonatal morbidity and mortality. Gynecologists must therefore accurately predict BW before delivery. Most prediction formulas for BW are based on prenatal ultrasound measurements carried out within one week prior to birth. Although successfully used in clinical practice, these formulas focus on point predictions of BW but do not systematically quantify uncertainty of the predictions, i.e. they result in estimates of the conditional mean of BW but do not deliver prediction intervals. To overcome this problem, we introduce conditionally linear transformation models (CLTMs) to predict BW. Instead of focusing only on the conditional mean, CLTMs model the whole conditional distribution function of BW given prenatal ultrasound parameters. Consequently, the CLTM approach delivers both point predictions of BW and fetus-specific prediction intervals. Prediction intervals constitute an easy-to-interpret measure of prediction accuracy and allow identification of fetuses subject to high prediction uncertainty. Using a data set of 8712 deliveries at the Perinatal Centre at the University Clinic Erlangen (Germany), we analyzed variants of CLTMs and compared them to standard linear regression estimation techniques used in the past and to quantile regression approaches. The best-performing CLTM variant was competitive with quantile regression and linear regression approaches in terms of conditional coverage and average length of the prediction intervals. We propose that CLTMs be used because they are able to account for possible heteroscedasticity, kurtosis, and skewness of the distribution of BWs. © The Author(s) 2014.
Building and verifying a severity prediction model of acute pancreatitis (AP) based on BISAP, MEWS and routine test indexes.

PubMed

Ye, Jiang-Feng; Zhao, Yu-Xin; Ju, Jian; Wang, Wei

2017-10-01

To discuss the value of the Bedside Index for Severity in Acute Pancreatitis (BISAP), Modified Early Warning Score (MEWS), serum Ca2+, similarly hereinafter, and red cell distribution width (RDW) for predicting the severity grade of acute pancreatitis and to develop and verify a more accurate scoring system to predict the severity of AP. In 302 patients with AP, we calculated BISAP and MEWS scores and conducted regression analyses on the relationships of BISAP scoring, RDW, MEWS, and serum Ca2+ with the severity of AP using single-factor logistics. The variables with statistical significance in the single-factor logistic regression were used in a multi-factor logistic regression model; forward stepwise regression was used to screen variables and build a multi-factor prediction model. A receiver operating characteristic curve (ROC curve) was constructed, and the significance of multi- and single-factor prediction models in predicting the severity of AP using the area under the ROC curve (AUC) was evaluated. The internal validity of the model was verified through bootstrapping. Among 302 patients with AP, 209 had mild acute pancreatitis (MAP) and 93 had severe acute pancreatitis (SAP). According to single-factor logistic regression analysis, we found that BISAP, MEWS and serum Ca2+ are prediction indexes of the severity of AP (P-value<0.001), whereas RDW is not a prediction index of AP severity (P-value>0.05). The multi-factor logistic regression analysis showed that BISAP and serum Ca2+ are independent prediction indexes of AP severity (P-value<0.001), and MEWS is not an independent prediction index of AP severity (P-value>0.05); BISAP is negatively related to serum Ca2+ (r=-0.330, P-value<0.001). The constructed model is as follows: ln()=7.306+1.151*BISAP-4.516*serum Ca2+. The predictive ability of each model for SAP follows the order of the combined BISAP and serum Ca2+ prediction model>Ca2+>BISAP. There is no statistical significance for the predictive ability of BISAP and serum Ca2+ (P-value>0.05); however, there is remarkable statistical significance for the predictive ability using the newly built prediction model as well as BISAP and serum Ca2+ individually (P-value<0.01). Verification of the internal validity of the models by bootstrapping is favorable. BISAP and serum Ca2+ have high predictive value for the severity of AP. However, the model built by combining BISAP and serum Ca2+ is remarkably superior to those of BISAP and serum Ca2+ individually. Furthermore, this model is simple, practical and appropriate for clinical use. Copyright © 2016. Published by Elsevier Masson SAS.
Performance characteristics of LOX-H2, tangential-entry, swirl-coaxial, rocket injectors

NASA Technical Reports Server (NTRS)

Howell, Doug; Petersen, Eric; Clark, Jim

1993-01-01

Development of a high performing swirl-coaxial injector requires an understanding of fundamental performance characteristics. This paper addresses the findings of studies on cold flow atomic characterizations which provided information on the influence of fluid properties and element operating conditions on the produced droplet sprays. These findings are applied to actual rocket conditions. The performance characteristics of swirl-coaxial injection elements under multi-element hot-fire conditions were obtained by analysis of combustion performance data from three separate test series. The injection elements are described and test results are analyzed using multi-variable linear regression. A direct comparison of test results indicated that reduced fuel injection velocity improved injection element performance through improved propellant mixing.

Global identifiability of linear compartmental models--a computer algebra algorithm.

PubMed

Audoly, S; D'Angiò, L; Saccomani, M P; Cobelli, C

1998-01-01

A priori global identifiability deals with the uniqueness of the solution for the unknown parameters of a model and is, thus, a prerequisite for parameter estimation of biological dynamic models. Global identifiability is however difficult to test, since it requires solving a system of algebraic nonlinear equations which increases both in nonlinearity degree and number of terms and unknowns with increasing model order. In this paper, a computer algebra tool, GLOBI (GLOBal Identifiability) is presented, which combines the topological transfer function method with the Buchberger algorithm, to test global identifiability of linear compartmental models. GLOBI allows for the automatic testing of a priori global identifiability of general structure compartmental models from general multi input-multi output experiments. Examples of usage of GLOBI to analyze a priori global identifiability of some complex biological compartmental models are provided.
Genetic programming based quantitative structure-retention relationships for the prediction of Kovats retention indices.

PubMed

Goel, Purva; Bapat, Sanket; Vyas, Renu; Tambe, Amruta; Tambe, Sanjeev S

2015-11-13

The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a solute on a chromatographic column. Commonly, multi-linear regression and artificial neural networks are used in the QSRR development in the gas chromatography (GC). In this study, an artificial intelligence based data-driven modeling formalism, namely genetic programming (GP), has been introduced for the development of quantitative structure based models predicting Kovats retention indices (KRI). The novelty of the GP formalism is that given an example dataset, it searches and optimizes both the form (structure) and the parameters of an appropriate linear/nonlinear data-fitting model. Thus, it is not necessary to pre-specify the form of the data-fitting model in the GP-based modeling. These models are also less complex, simple to understand, and easy to deploy. The effectiveness of GP in constructing QSRRs has been demonstrated by developing models predicting KRIs of light hydrocarbons (case study-I) and adamantane derivatives (case study-II). In each case study, two-, three- and four-descriptor models have been developed using the KRI data available in the literature. The results of these studies clearly indicate that the GP-based models possess an excellent KRI prediction accuracy and generalization capability. Specifically, the best performing four-descriptor models in both the case studies have yielded high (>0.9) values of the coefficient of determination (R(2)) and low values of root mean squared error (RMSE) and mean absolute percent error (MAPE) for training, test and validation set data. The characteristic feature of this study is that it introduces a practical and an effective GP-based method for developing QSRRs in gas chromatography that can be gainfully utilized for developing other types of data-driven models in chromatography science. Copyright © 2015 Elsevier B.V. All rights reserved.
Multi-metals column adsorption of lead(II), cadmium(II) and manganese(II) onto natural bentonite clay.

PubMed

Alexander, Jock Asanja; Surajudeen, Abdulsalam; Aliyu, El-Nafaty Usman; Omeiza, Aroke Umar; Zaini, Muhammad Abbas Ahmad

2017-10-01

The present work was aimed at evaluating the multi-metals column adsorption of lead(II), cadmium(II) and manganese(II) ions onto natural bentonite. The bentonite clay adsorbent was characterized for physical and chemical properties using X-ray diffraction, X-ray fluorescence, Brunauer-Emmett-Teller surface area and cation exchange capacity. The column performance was evaluated using adsorbent bed height of 5.0 cm, with varying influent concentrations (10 mg/L and 50 mg/L) and flow rates (1.4 mL/min and 2.4 mL/min). The result shows that the breakthrough time for all metal ions ranged from 50 to 480 minutes. The maximum adsorption capacity was obtained at initial concentration of 10 mg/L and flow rate of 1.4 mL/min, with 2.22 mg/g of lead(II), 1.71 mg/g of cadmium(II) and 0.37 mg/g of manganese(II). The order of metal ions removal by natural bentonite is lead(II) > cadmium(II) > manganese(II). The sorption performance and the dynamic behaviour of the column were predicted using Adams-Bohart, Thomas, and Yoon-Nelson models. The linear regression analysis demonstrated that the Thomas and Yoon-Nelson models fitted well with the column adsorption data for all metal ions. The natural bentonite was effective for the treatment of wastewater laden with multi-metals, and the process parameters obtained from this work can be used at the industrial scale.
Analyzing industrial energy use through ordinary least squares regression models

NASA Astrophysics Data System (ADS)

Golden, Allyson Katherine

Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and production behavior, and identify opportunities for energy and cost savings. This thesis study also utilizes change-point and degree-day baseline energy models to disaggregate facility annual energy consumption into separate industrial end-user categories. The baseline energy model provides a suitable and economical alternative to sub-metering individual manufacturing equipment. One case study describes the conjoined use of baseline energy models and facility information gathered during a one-day onsite visit to perform an end-point energy analysis of an injection molding facility conducted by the Alabama Industrial Assessment Center. Applying baseline regression model results to the end-point energy analysis allowed the AIAC to better approximate the annual energy consumption of the facility's HVAC system.
Real-time soil sensing based on fiber optics and spectroscopy

NASA Astrophysics Data System (ADS)

Li, Minzan

2005-08-01

Using NIR spectroscopic techniques, correlation analysis and regression analysis for soil parameter estimation was conducted with raw soil samples collected in a cornfield and a forage field. Soil parameters analyzed were soil moisture, soil organic matter, nitrate nitrogen, soil electrical conductivity and pH. Results showed that all soil parameters could be evaluated by NIR spectral reflectance. For soil moisture, a linear regression model was available at low moisture contents below 30 % db, while an exponential model can be used in a wide range of moisture content up to 100 % db. Nitrate nitrogen estimation required a multi-spectral exponential model and electrical conductivity could be evaluated by a single spectral regression. According to the result above mentioned, a real time soil sensor system based on fiber optics and spectroscopy was developed. The sensor system was composed of a soil subsoiler with four optical fiber probes, a spectrometer, and a control unit. Two optical fiber probes were used for illumination and the other two optical fiber probes for collecting soil reflectance from visible to NIR wavebands at depths around 30 cm. The spectrometer was used to obtain the spectra of reflected lights. The control unit consisted of a data logging device, a personal computer, and a pulse generator. The experiment showed that clear photo-spectral reflectance was obtained from the underground soil. The soil reflectance was equal to that obtained by the desktop spectrophotometer in laboratory tests. Using the spectral reflectance, the soil parameters, such as soil moisture, pH, EC and SOM, were evaluated.
Stability analysis of the phytoplankton effect model on changes in nitrogen concentration on integrated multi-trophic aquaculture systems

NASA Astrophysics Data System (ADS)

Widowati; Putro, S. P.; Silfiana

2018-05-01

Integrated Multi-Trophic Aquaculture (IMTA) is a polyculture with several biotas maintained in it to optimize waste recycling as a food source. The interaction between phytoplankton and nitrogen as waste in fish cultivation including ammonia, nitrite, and nitrate studied in the form of mathematical models. The form model is non-linear systems of differential equations with the four variables. The analytical analysis was used to study the dynamic behavior of this model. Local stability analysis is performed at the equilibrium point with the first step linearized model by using Taylor series, then determined the Jacobian matrix. If all eigenvalues have negative real parts, then the equilibrium of the system is locally asymptotic stable. Some numerical simulations were also demonstrated to verify our analytical result.
A model of the human in a cognitive prediction task.

NASA Technical Reports Server (NTRS)

Rouse, W. B.

1973-01-01

The human decision maker's behavior when predicting future states of discrete linear dynamic systems driven by zero-mean Gaussian processes is modeled. The task is on a slow enough time scale that physiological constraints are insignificant compared with cognitive limitations. The model is basically a linear regression system identifier with a limited memory and noisy observations. Experimental data are presented and compared to the model.
Spacebased Estimation of Moisture Transport in Marine Atmosphere Using Support Vector Regression

NASA Technical Reports Server (NTRS)

Xie, Xiaosu; Liu, W. Timothy; Tang, Benyang

2007-01-01

An improved algorithm is developed based on support vector regression (SVR) to estimate horizonal water vapor transport integrated through the depth of the atmosphere ((Theta)) over the global ocean from observations of surface wind-stress vector by QuikSCAT, cloud drift wind vector derived from the Multi-angle Imaging SpectroRadiometer (MISR) and geostationary satellites, and precipitable water from the Special Sensor Microwave/Imager (SSM/I). The statistical relation is established between the input parameters (the surface wind stress, the 850 mb wind, the precipitable water, time and location) and the target data ((Theta) calculated from rawinsondes and reanalysis of numerical weather prediction model). The results are validated with independent daily rawinsonde observations, monthly mean reanalysis data, and through regional water balance. This study clearly demonstrates the improvement of (Theta) derived from satellite data using SVR over previous data sets based on linear regression and neural network. The SVR methodology reduces both mean bias and standard deviation comparedwith rawinsonde observations. It agrees better with observations from synoptic to seasonal time scales, and compare more favorably with the reanalysis data on seasonal variations. Only the SVR result can achieve the water balance over South America. The rationale of the advantage by SVR method and the impact of adding the upper level wind will also be discussed.
A Comparison of Conventional Linear Regression Methods and Neural Networks for Forecasting Educational Spending.

ERIC Educational Resources Information Center

Baker, Bruce D.; Richards, Craig E.

1999-01-01

Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
Minimizing bias in biomass allometry: Model selection and log transformation of data

Treesearch

Joseph Mascaro; undefined undefined; Flint Hughes; Amanda Uowolo; Stefan A. Schnitzer

2011-01-01

Nonlinear regression is increasingly used to develop allometric equations for forest biomass estimation (i.e., as opposed to the raditional approach of log-transformation followed by linear regression). Most statistical software packages, however, assume additive errors by default, violating a key assumption of allometric theory and possibly producing spurious models....
A comparative study between nonlinear regression and artificial neural network approaches for modelling wild oat (Avena fatua) field emergence

USDA-ARS?s Scientific Manuscript database

Non-linear regression techniques are used widely to fit weed field emergence patterns to soil microclimatic indices using S-type functions. Artificial neural networks present interesting and alternative features for such modeling purposes. In this work, a univariate hydrothermal-time based Weibull m...
Toward customer-centric organizational science: A common language effect size indicator for multiple linear regressions and regressions with higher-order terms.

PubMed

Krasikova, Dina V; Le, Huy; Bachura, Eric

2018-06-01

To address a long-standing concern regarding a gap between organizational science and practice, scholars called for more intuitive and meaningful ways of communicating research results to users of academic research. In this article, we develop a common language effect size index (CLβ) that can help translate research results to practice. We demonstrate how CLβ can be computed and used to interpret the effects of continuous and categorical predictors in multiple linear regression models. We also elaborate on how the proposed CLβ index is computed and used to interpret interactions and nonlinear effects in regression models. In addition, we test the robustness of the proposed index to violations of normality and provide means for computing standard errors and constructing confidence intervals around its estimates. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Schistosomiasis Breeding Environment Situation Analysis in Dongting Lake Area

NASA Astrophysics Data System (ADS)

Li, Chuanrong; Jia, Yuanyuan; Ma, Lingling; Liu, Zhaoyan; Qian, Yonggang

2013-01-01

Monitoring environmental characteristics, such as vegetation, soil moisture et al., of Oncomelania hupensis (O. hupensis)’ spatial/temporal distribution is of vital importance to the schistosomiasis prevention and control. In this study, the relationship between environmental factors derived from remotely sensed data and the density of O. hupensis was analyzed by a multiple linear regression model. Secondly, spatial analysis of the regression residual was investigated by the semi-variogram method. Thirdly, spatial analysis of the regression residual and the multiple linear regression model were both employed to estimate the spatial variation of O. hupensis density. Finally, the approach was used to monitor and predict the spatial and temporal variations of oncomelania of Dongting Lake region, China. And the areas of potential O. hupensis habitats were predicted and the influence of Three Gorges Dam (TGB)project on the density of O. hupensis was analyzed.
Explanatory Power of Multi-scale Physical Descriptors in Modeling Benthic Indices Across Nested Ecoregions of the Pacific Northwest

NASA Astrophysics Data System (ADS)

Holburn, E. R.; Bledsoe, B. P.; Poff, N. L.; Cuhaciyan, C. O.

2005-05-01

Using over 300 R/EMAP sites in OR and WA, we examine the relative explanatory power of watershed, valley, and reach scale descriptors in modeling variation in benthic macroinvertebrate indices. Innovative metrics describing flow regime, geomorphic processes, and hydrologic-distance weighted watershed and valley characteristics are used in multiple regression and regression tree modeling to predict EPT richness, % EPT, EPT/C, and % Plecoptera. A nested design using seven ecoregions is employed to evaluate the influence of geographic scale and environmental heterogeneity on the explanatory power of individual and combined scales. Regression tree models are constructed to explain variability while identifying threshold responses and interactions. Cross-validated models demonstrate differences in the explanatory power associated with single-scale and multi-scale models as environmental heterogeneity is varied. Models explaining the greatest variability in biological indices result from multi-scale combinations of physical descriptors. Results also indicate that substantial variation in benthic macroinvertebrate response can be explained with process-based watershed and valley scale metrics derived exclusively from common geospatial data. This study outlines a general framework for identifying key processes driving macroinvertebrate assemblages across a range of scales and establishing the geographic extent at which various levels of physical description best explain biological variability. Such information can guide process-based stratification to avoid spurious comparison of dissimilar stream types in bioassessments and ensure that key environmental gradients are adequately represented in sampling designs.
Representational change and strategy use in children's number line estimation during the first years of primary school.

PubMed

White, Sonia L J; Szűcs, Dénes

2012-01-04

The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice.
Representational change and strategy use in children's number line estimation during the first years of primary school

PubMed Central

2012-01-01

Background The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Methods Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Results Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. Conclusion In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice. PMID:22217191
The Use of Linear Instrumental Variables Methods in Health Services Research and Health Economics: A Cautionary Note

PubMed Central

Terza, Joseph V; Bradford, W David; Dismuke, Clara E

2008-01-01

Objective To investigate potential bias in the use of the conventional linear instrumental variables (IV) method for the estimation of causal effects in inherently nonlinear regression settings. Data Sources Smoking Supplement to the 1979 National Health Interview Survey, National Longitudinal Alcohol Epidemiologic Survey, and simulated data. Study Design Potential bias from the use of the linear IV method in nonlinear models is assessed via simulation studies and real world data analyses in two commonly encountered regression setting: (1) models with a nonnegative outcome (e.g., a count) and a continuous endogenous regressor; and (2) models with a binary outcome and a binary endogenous regressor. Principle Findings The simulation analyses show that substantial bias in the estimation of causal effects can result from applying the conventional IV method in inherently nonlinear regression settings. Moreover, the bias is not attenuated as the sample size increases. This point is further illustrated in the survey data analyses in which IV-based estimates of the relevant causal effects diverge substantially from those obtained with appropriate nonlinear estimation methods. Conclusions We offer this research as a cautionary note to those who would opt for the use of linear specifications in inherently nonlinear settings involving endogeneity. PMID:18546544
Developing a predictive tropospheric ozone model for Tabriz

NASA Astrophysics Data System (ADS)

Khatibi, Rahman; Naghipour, Leila; Ghorbani, Mohammad A.; Smith, Michael S.; Karimi, Vahid; Farhoudi, Reza; Delafrouz, Hadi; Arvanaghi, Hadi

2013-04-01

Predictive ozone models are becoming indispensable tools by providing a capability for pollution alerts to serve people who are vulnerable to the risks. We have developed a tropospheric ozone prediction capability for Tabriz, Iran, by using the following five modeling strategies: three regression-type methods: Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs), and Gene Expression Programming (GEP); and two auto-regression-type models: Nonlinear Local Prediction (NLP) to implement chaos theory and Auto-Regressive Integrated Moving Average (ARIMA) models. The regression-type modeling strategies explain the data in terms of: temperature, solar radiation, dew point temperature, and wind speed, by regressing present ozone values to their past values. The ozone time series are available at various time intervals, including hourly intervals, from August 2010 to March 2011. The results for MLR, ANN and GEP models are not overly good but those produced by NLP and ARIMA are promising for the establishing a forecasting capability.
Changes in Clavicle Length and Maturation in Americans: 1840-1980.

PubMed

Langley, Natalie R; Cridlin, Sandra

2016-01-01

Secular changes refer to short-term biological changes ostensibly due to environmental factors. Two well-documented secular trends in many populations are earlier age of menarche and increasing stature. This study synthesizes data on maximum clavicle length and fusion of the medial epiphysis in 1840-1980 American birth cohorts to provide a comprehensive assessment of developmental and morphological change in the clavicle. Clavicles from the Hamann-Todd Human Osteological Collection (n = 354), McKern and Stewart Korean War males (n = 341), Forensic Anthropology Data Bank (n = 1,239), and the McCormick Clavicle Collection (n = 1,137) were used in the analysis. Transition analysis was used to evaluate fusion of the medial epiphysis (scored as unfused, fusing, or fused). Several statistical treatments were used to assess fluctuations in maximum clavicle length. First, Durbin-Watson tests were used to evaluate autocorrelation, and a local regression (LOESS) was used to identify visual shifts in the regression slope. Next, piecewise regression was used to fit linear regression models before and after the estimated breakpoints. Multiple starting parameters were tested in the range determined to contain the breakpoint, and the model with the smallest mean squared error was chosen as the best fit. The parameters from the best-fit models were then used to derive the piecewise models, which were compared with the initial simple linear regression models to determine which model provided the best fit for the secular change data. The epiphyseal union data indicate a decline in the age at onset of fusion since the early twentieth century. Fusion commences approximately four years earlier in mid- to late twentieth-century birth cohorts than in late nineteenth- and early twentieth-century birth cohorts. However, fusion is completed at roughly the same age across cohorts. The most significant decline in age at onset of epiphyseal union appears to have occurred since the mid-twentieth century. LOESS plots show a breakpoint in the clavicle length data around the mid-twentieth century in both sexes, and piecewise regression models indicate a significant decrease in clavicle length in the American population after 1940. The piecewise model provides a slightly better fit than the simple linear model. Since the model standard error is not substantially different from the piecewise model, an argument could be made to select the less complex linear model. However, we chose the piecewise model to detect changes in clavicle length that are overfitted with a linear model. The decrease in maximum clavicle length is in line with a documented narrowing of the American skeletal form, as shown by analyses of cranial and facial breadth and bi-iliac breadth of the pelvis. Environmental influences on skeletal form include increases in body mass index, health improvements, improved socioeconomic status, and elimination of infectious diseases. Secular changes in bony dimensions and skeletal maturation stipulate that medical and forensic standards used to deduce information about growth, health, and biological traits must be derived from modern populations.
Modification of the USLE K factor for soil erodibility assessment on calcareous soils in Iran

NASA Astrophysics Data System (ADS)

Ostovari, Yaser; Ghorbani-Dashtaki, Shoja; Bahrami, Hossein-Ali; Naderi, Mehdi; Dematte, Jose Alexandre M.; Kerry, Ruth

2016-11-01

The measurement of soil erodibility (K) in the field is tedious, time-consuming and expensive; therefore, its prediction through pedotransfer functions (PTFs) could be far less costly and time-consuming. The aim of this study was to develop new PTFs to estimate the K factor using multiple linear regression, Mamdani fuzzy inference systems, and artificial neural networks. For this purpose, K was measured in 40 erosion plots with natural rainfall. Various soil properties including the soil particle size distribution, calcium carbonate equivalent, organic matter, permeability, and wet-aggregate stability were measured. The results showed that the mean measured K was 0.014 t h MJ- 1 mm- 1 and 2.08 times less than the estimated mean K (0.030 t h MJ- 1 mm- 1) using the USLE model. Permeability, wet-aggregate stability, very fine sand, and calcium carbonate were selected as independent variables by forward stepwise regression in order to assess the ability of multiple linear regression, Mamdani fuzzy inference systems and artificial neural networks to predict K. The calcium carbonate equivalent, which is not accounted for in the USLE model, had a significant impact on K in multiple linear regression due to its strong influence on the stability of aggregates and soil permeability. Statistical indices in validation and calibration datasets determined that the artificial neural networks method with the highest R2, lowest RMSE, and lowest ME was the best model for estimating the K factor. A strong correlation (R2 = 0.81, n = 40, p < 0.05) between the estimated K from multiple linear regression and measured K indicates that the use of calcium carbonate equivalent as a predictor variable gives a better estimation of K in areas with calcareous soils.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.