Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
Practical Session: Multiple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr33.pdf) and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_5.pdf).
A tutorial on Bayesian Normal linear regression
NASA Astrophysics Data System (ADS)
Klauenberg, Katy; Wübbeler, Gerd; Mickan, Bodo; Harris, Peter; Elster, Clemens
2015-12-01
Regression is a common task in metrology and often applied to calibrate instruments, evaluate inter-laboratory comparisons or determine fundamental constants, for example. Yet, a regression model cannot be uniquely formulated as a measurement function, and consequently the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements are not applicable directly. Bayesian inference, however, is well suited to regression tasks, and has the advantage of accounting for additional a priori information, which typically robustifies analyses. Furthermore, it is anticipated that future revisions of the GUM shall also embrace the Bayesian view. Guidance on Bayesian inference for regression tasks is largely lacking in metrology. For linear regression models with Gaussian measurement errors this tutorial gives explicit guidance. Divided into three steps, the tutorial first illustrates how a priori knowledge, which is available from previous experiments, can be translated into prior distributions from a specific class. These prior distributions have the advantage of yielding analytical, closed form results, thus avoiding the need to apply numerical methods such as Markov Chain Monte Carlo. Secondly, formulas for the posterior results are given, explained and illustrated, and software implementations are provided. In the third step, Bayesian tools are used to assess the assumptions behind the suggested approach. These three steps (prior elicitation, posterior calculation, and robustness to prior uncertainty and model adequacy) are critical to Bayesian inference. The general guidance given here for Normal linear regression tasks is accompanied by a simple, but real-world, metrological example. The calibration of a flow device serves as a running example and illustrates the three steps. It is shown that prior knowledge from previous calibrations of the same sonic nozzle enables robust predictions even for extrapolations.
Three-Dimensional Modeling in Linear Regression.
ERIC Educational Resources Information Center
Herman, James D.
Linear regression examines the relationship between one or more independent (predictor) variables and a dependent variable. By using a particular formula, regression determines the weights needed to minimize the error term for a given set of predictors. With one predictor variable, the relationship between the predictor and the dependent variable…
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.
A Constrained Linear Estimator for Multiple Regression
ERIC Educational Resources Information Center
Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.
2010-01-01
"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
Discriminative Elastic-Net Regularized Linear Regression.
Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen
2017-03-01
In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.
A Gibbs sampler for multivariate linear regression
NASA Astrophysics Data System (ADS)
Mantz, Adam B.
2016-04-01
Kelly described an efficient algorithm, using Gibbs sampling, for performing linear regression in the fairly general case where non-zero measurement errors exist for both the covariates and response variables, where these measurements may be correlated (for the same data point), where the response variable is affected by intrinsic scatter in addition to measurement error, and where the prior distribution of covariates is modelled by a flexible mixture of Gaussians rather than assumed to be uniform. Here, I extend the Kelly algorithm in two ways. First, the procedure is generalized to the case of multiple response variables. Secondly, I describe how to model the prior distribution of covariates using a Dirichlet process, which can be thought of as a Gaussian mixture where the number of mixture components is learned from the data. I present an example of multivariate regression using the extended algorithm, namely fitting scaling relations of the gas mass, temperature, and luminosity of dynamically relaxed galaxy clusters as a function of their mass and redshift. An implementation of the Gibbs sampler in the R language, called LRGS, is provided.
Moving the Bar: Transformations in Linear Regression.
ERIC Educational Resources Information Center
Miranda, Janet
The assumption that is most important to the hypothesis testing procedure of multiple linear regression is the assumption that the residuals are normally distributed, but this assumption is not always tenable given the realities of some data sets. When normal distribution of the residuals is not met, an alternative method can be initiated. As an…
Multiple linear regression for isotopic measurements
NASA Astrophysics Data System (ADS)
Garcia Alonso, J. I.
2012-04-01
There are two typical applications of isotopic measurements: the detection of natural variations in isotopic systems and the detection man-made variations using enriched isotopes as indicators. For both type of measurements accurate and precise isotope ratio measurements are required. For the so-called non-traditional stable isotopes, multicollector ICP-MS instruments are usually applied. In many cases, chemical separation procedures are required before accurate isotope measurements can be performed. The off-line separation of Rb and Sr or Nd and Sm is the classical procedure employed to eliminate isobaric interferences before multicollector ICP-MS measurement of Sr and Nd isotope ratios. Also, this procedure allows matrix separation for precise and accurate Sr and Nd isotope ratios to be obtained. In our laboratory we have evaluated the separation of Rb-Sr and Nd-Sm isobars by liquid chromatography and on-line multicollector ICP-MS detection. The combination of this chromatographic procedure with multiple linear regression of the raw chromatographic data resulted in Sr and Nd isotope ratios with precisions and accuracies typical of off-line sample preparation procedures. On the other hand, methods for the labelling of individual organisms (such as a given plant, fish or animal) are required for population studies. We have developed a dual isotope labelling procedure which can be unique for a given individual, can be inherited in living organisms and it is stable. The detection of the isotopic signature is based also on multiple linear regression. The labelling of fish and its detection in otoliths by Laser Ablation ICP-MS will be discussed using trout and salmon as examples. As a conclusion, isotope measurement procedures based on multiple linear regression can be a viable alternative in multicollector ICP-MS measurements.
Double linear regression classification for face recognition
NASA Astrophysics Data System (ADS)
Feng, Qingxiang; Zhu, Qi; Tang, Lin-Lin; Pan, Jeng-Shyang
2015-02-01
A new classifier designed based on linear regression classification (LRC) classifier and simple-fast representation-based classifier (SFR), named double linear regression classification (DLRC) classifier, is proposed for image recognition in this paper. As we all know, the traditional LRC classifier only uses the distance between test image vectors and predicted image vectors of the class subspace for classification. And the SFR classifier uses the test image vectors and the nearest image vectors of the class subspace to classify the test sample. However, the DLRC classifier computes out the predicted image vectors of each class subspace and uses all the predicted vectors to construct a novel robust global space. Then, the DLRC utilizes the novel global space to get the novel predicted vectors of each class for classification. A mass number of experiments on AR face database, JAFFE face database, Yale face database, Extended YaleB face database, and PIE face database are used to evaluate the performance of the proposed classifier. The experimental results show that the proposed classifier achieves better recognition rate than the LRC classifier, SFR classifier, and several other classifiers.
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
Sparse brain network using penalized linear regression
NASA Astrophysics Data System (ADS)
Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.
2011-03-01
Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.
Subgroup finding via Bayesian additive regression trees.
Sivaganesan, Siva; Müller, Peter; Huang, Bin
2017-03-09
We provide a Bayesian decision theoretic approach to finding subgroups that have elevated treatment effects. Our approach separates the modeling of the response variable from the task of subgroup finding and allows a flexible modeling of the response variable irrespective of potential subgroups of interest. We use Bayesian additive regression trees to model the response variable and use a utility function defined in terms of a candidate subgroup and the predicted response for that subgroup. Subgroups are identified by maximizing the expected utility where the expectation is taken with respect to the posterior predictive distribution of the response, and the maximization is carried out over an a priori specified set of candidate subgroups. Our approach allows subgroups based on both quantitative and categorical covariates. We illustrate the approach using simulated data set study and a real data set. Copyright © 2017 John Wiley & Sons, Ltd.
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Suppression Situations in Multiple Linear Regression
ERIC Educational Resources Information Center
Shieh, Gwowen
2006-01-01
This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…
A Model for Quadratic Outliers in Linear Regression.
ERIC Educational Resources Information Center
Elashoff, Janet Dixon; Elashoff, Robert M.
This paper introduces a model for describing outliers (observations which are extreme in some sense or violate the apparent pattern of other observations) in linear regression which can be viewed as a mixture of a quadratic and a linear regression. The maximum likelihood estimators of the parameters in the model are derived and their asymptotic…
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
Predicting cognitive data from medical images using sparse linear regression.
Kandel, Benjamin M; Wolk, David A; Gee, James C; Avants, Brian
2013-01-01
We present a new framework for predicting cognitive or other continuous-variable data from medical images. Current methods of probing the connection between medical images and other clinical data typically use voxel-based mass univariate approaches. These approaches do not take into account the multivariate, network-based interactions between the various areas of the brain and do not give readily interpretable metrics that describe how strongly cognitive function is related to neuroanatomical structure. On the other hand, high-dimensional machine learning techniques do not typically provide a direct method for discovering which parts of the brain are used for making predictions. We present a framework, based on recent work in sparse linear regression, that addresses both drawbacks of mass univariate approaches, while preserving the direct spatial interpretability that they provide. In addition, we present a novel optimization algorithm that adapts the conjugate gradient method for sparse regression on medical imaging data. This algorithm produces coefficients that are more interpretable than existing sparse regression techniques.
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Learning a Nonnegative Sparse Graph for Linear Regression.
Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung
2015-09-01
Previous graph-based semisupervised learning (G-SSL) methods have the following drawbacks: 1) they usually predefine the graph structure and then use it to perform label prediction, which cannot guarantee an overall optimum and 2) they only focus on the label prediction or the graph structure construction but are not competent in handling new samples. To this end, a novel nonnegative sparse graph (NNSG) learning method was first proposed. Then, both the label prediction and projection learning were integrated into linear regression. Finally, the linear regression and graph structure learning were unified within the same framework to overcome these two drawbacks. Therefore, a novel method, named learning a NNSG for linear regression was presented, in which the linear regression and graph learning were simultaneously performed to guarantee an overall optimum. In the learning process, the label information can be accurately propagated via the graph structure so that the linear regression can learn a discriminative projection to better fit sample labels and accurately classify new samples. An effective algorithm was designed to solve the corresponding optimization problem with fast convergence. Furthermore, NNSG provides a unified perceptiveness for a number of graph-based learning methods and linear regression methods. The experimental results showed that NNSG can obtain very high classification accuracy and greatly outperforms conventional G-SSL methods, especially some conventional graph construction methods.
Linear regression analysis of survival data with missing censoring indicators
Wang, Qihua
2010-01-01
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial. PMID:20559722
Biostatistics Series Module 6: Correlation and Linear Regression.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r(2) denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
How Robust Is Linear Regression with Dummy Variables?
ERIC Educational Resources Information Center
Blankmeyer, Eric
2006-01-01
Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…
A linear regression solution to the spatial autocorrelation problem
NASA Astrophysics Data System (ADS)
Griffith, Daniel A.
The Moran Coefficient spatial autocorrelation index can be decomposed into orthogonal map pattern components. This decomposition relates it directly to standard linear regression, in which corresponding eigenvectors can be used as predictors. This paper reports comparative results between these linear regressions and their auto-Gaussian counterparts for the following georeferenced data sets: Columbus (Ohio) crime, Ottawa-Hull median family income, Toronto population density, southwest Ohio unemployment, Syracuse pediatric lead poisoning, and Glasgow standard mortality rates, and a small remotely sensed image of the High Peak district. This methodology is extended to auto-logistic and auto-Poisson situations, with selected data analyses including percentage of urban population across Puerto Rico, and the frequency of SIDs cases across North Carolina. These data analytic results suggest that this approach to georeferenced data analysis offers considerable promise.
Imbedding linear regressions in models for factor crossing
NASA Astrophysics Data System (ADS)
Santos, Carla; Nunes, Célia; Dias, Cristina; Varadinov, Maria; Mexia, João T.
2016-12-01
Given u factors with J1, …, Ju levels we are led to test their effects and interactions. For this we consider an orthogonal partition of Rn, with n =∏l=1uJl, in subspaces associated with the sets of factors. The space corresponding to the set C will have density g (C )=∏l∈C(Jl-1) so that g({1, …, u}) will be much larger than the other number of degrees of freedom when Jl > 2, l = 1, …, u This fact may be used to enrich these models imbedding in them linear regressions.
Modeling pan evaporation for Kuwait by multiple linear regression.
Almedeij, Jaber
2012-01-01
Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.
Adaptive local linear regression with application to printer color management.
Gupta, Maya R; Garcia, Eric K; Chin, Erika
2008-06-01
Local learning methods, such as local linear regression and nearest neighbor classifiers, base estimates on nearby training samples, neighbors. Usually, the number of neighbors used in estimation is fixed to be a global "optimal" value, chosen by cross validation. This paper proposes adapting the number of neighbors used for estimation to the local geometry of the data, without need for cross validation. The term enclosing neighborhood is introduced to describe a set of neighbors whose convex hull contains the test point when possible. It is proven that enclosing neighborhoods yield bounded estimation variance under some assumptions. Three such enclosing neighborhood definitions are presented: natural neighbors, natural neighbors inclusive, and enclosing k-NN. The effectiveness of these neighborhood definitions with local linear regression is tested for estimating lookup tables for color management. Significant improvements in error metrics are shown, indicating that enclosing neighborhoods may be a promising adaptive neighborhood definition for other local learning tasks as well, depending on the density of training samples.
2012-01-01
Background Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. Methods We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). Results The elastic net, lasso, adaptive lasso and the
Scarneciu, Camelia C.; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D.; Varciu, Mihai S.; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
Objectives: This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. Methods: The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). Results: From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. Conclusions: The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is
Prediction by linear regression on a quantum computer
NASA Astrophysics Data System (ADS)
Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco
2016-08-01
We give an algorithm for prediction on a quantum computer which is based on a linear regression model with least-squares optimization. In contrast to related previous contributions suffering from the problem of reading out the optimal parameters of the fit, our scheme focuses on the machine-learning task of guessing the output corresponding to a new input given examples of data points. Furthermore, we adapt the algorithm to process nonsparse data matrices that can be represented by low-rank approximations, and significantly improve the dependency on its condition number. The prediction result can be accessed through a single-qubit measurement or used for further quantum information processing routines. The algorithm's runtime is logarithmic in the dimension of the input space provided the data is given as quantum information as an input to the routine.
Contiguous Uniform Deviation for Multiple Linear Regression in Pattern Recognition
NASA Astrophysics Data System (ADS)
Andriana, A. S.; Prihatmanto, D.; Hidaya, E. M. I.; Supriana, I.; Machbub, C.
2017-01-01
Understanding images by recognizing its objects is still a challenging task. Face elements detection has been developed by researchers but not yet shows enough information (low resolution in information) needed for recognizing objects. Available face recognition methods still have error in classification and need a huge amount of examples which may still be incomplete. Another approach which is still rare in understanding images uses pattern structures or syntactic grammars describing shape detail features. Image pixel values are also processed as signal patterns which are approximated by mathematical function curve fitting. This paper attempts to add contiguous uniform deviation method to curve fitting algorithm to increase applicability in image recognition system related to object movement. The combination of multiple linear regression and contiguous uniform deviation method are applied to the function of image pixel values, and show results in higher resolution (more information) of visual object detail description in object movement.
Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions.
Kumar, Gaurav; Bajaj, Rakesh Kumar
2014-01-01
In fuzzy set theory, it is well known that a triangular fuzzy number can be uniquely determined through its position and entropies. In the present communication, we extend this concept on triangular intuitionistic fuzzy number for its one-to-one correspondence with its position and entropies. Using the concept of fuzzy entropy the estimators of the intuitionistic fuzzy regression coefficients have been estimated in the unrestricted regression model. An intuitionistic fuzzy weighted linear regression (IFWLR) model with some restrictions in the form of prior information has been considered. Further, the estimators of regression coefficients have been obtained with the help of fuzzy entropy for the restricted/unrestricted IFWLR model by assigning some weights in the distance function.
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Almedeij, Jaber
2012-01-01
Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984
The Allometry of Coarse Root Biomass: Log-Transformed Linear Regression or Nonlinear Regression?
Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J.; Ma, Keping
2013-01-01
Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees. PMID:24116197
Outlier Detection In Linear Regression Using Standart Parity Space Approach
NASA Astrophysics Data System (ADS)
Mustafa Durdag, Utkan; Hekimoglu, Serif
2013-04-01
Despite all technological advancements, outliers may occur due to some mistakes in engineering measurements. Before estimation of unknown parameters, aforementioned outliers must be detected and removed from the measurements. There are two main outlier detection methods: the conventional tests based on least square approach (e.g. Baarda, Pope etc.) and the robust tests (e.g. Huber, Hampel etc.) are used to identify outliers in a set of measurement. Standart Parity Space Approach is one of the important model-based Fault Detection and Isolation (FDI) technique that usually uses in Control Engineering. In this study the standart parity space method is used for outlier detection in linear regression. Our main goal is to compare success of two approaches of standart parity space method and conventional tests in linear regression through the Monte Carlo simulation with each other. The least square estimation is the most common estimator as known and it minimizes the sum of squared residuals. In standart parity space approach to eliminate unknown vector, the measurement vector projected onto the left null space of the coefficient matrix. Thus, the orthogonal condition of parity vector is satisfied and only the effects of noise vector noticed. The residual vector is derived from two cases that one is absence of an outlier; the other is occurrence of an outlier. Its likelihood function is used for determining the detection decision function for global Test. Localization decision function is calculated for each column of parity matrix and the maximum one of these values is accepted as an outlier. There are some results obtained from two different intervals that one of them is between 3σ and 6σ (small outlier) the other one is between 6σ and 12σ (large outlier) for outlier generator when the number of unknown parameter is chosen 2 and 3. The measure success rates (MSR) of Baarda's method is better than the standart parity space method when the confidence intervals are
Forecasting Groundwater Temperature with Linear Regression Models Using Historical Data.
Figura, Simon; Livingstone, David M; Kipfer, Rolf
2015-01-01
Although temperature is an important determinant of many biogeochemical processes in groundwater, very few studies have attempted to forecast the response of groundwater temperature to future climate warming. Using a composite linear regression model based on the lagged relationship between historical groundwater and regional air temperature data, empirical forecasts were made of groundwater temperature in several aquifers in Switzerland up to the end of the current century. The model was fed with regional air temperature projections calculated for greenhouse-gas emissions scenarios A2, A1B, and RCP3PD. Model evaluation revealed that the approach taken is adequate only when the data used to calibrate the models are sufficiently long and contain sufficient variability. These conditions were satisfied for three aquifers, all fed by riverbank infiltration. The forecasts suggest that with respect to the reference period 1980 to 2009, groundwater temperature in these aquifers will most likely increase by 1.1 to 3.8 K by the end of the current century, depending on the greenhouse-gas emissions scenario employed.
Robust cross-validation of linear regression QSAR models.
Konovalov, Dmitry A; Llewellyn, Lyndon E; Vander Heyden, Yvan; Coomans, Danny
2008-10-01
A quantitative structure-activity relationship (QSAR) model is typically developed to predict the biochemical activity of untested compounds from the compounds' molecular structures. "The gold standard" of model validation is the blindfold prediction when the model's predictive power is assessed from how well the model predicts the activity values of compounds that were not considered in any way during the model development/calibration. However, during the development of a QSAR model, it is necessary to obtain some indication of the model's predictive power. This is often done by some form of cross-validation (CV). In this study, the concepts of the predictive power and fitting ability of a multiple linear regression (MLR) QSAR model were examined in the CV context allowing for the presence of outliers. Commonly used predictive power and fitting ability statistics were assessed via Monte Carlo cross-validation when applied to percent human intestinal absorption, blood-brain partition coefficient, and toxicity values of saxitoxin QSAR data sets, as well as three known benchmark data sets with known outlier contamination. It was found that (1) a robust version of MLR should always be preferred over the ordinary-least-squares MLR, regardless of the degree of outlier contamination and that (2) the model's predictive power should only be assessed via robust statistics. The Matlab and java source code used in this study is freely available from the QSAR-BENCH section of www.dmitrykonovalov.org for academic use. The Web site also contains the java-based QSAR-BENCH program, which could be run online via java's Web Start technology (supporting Windows, Mac OSX, Linux/Unix) to reproduce most of the reported results or apply the reported procedures to other data sets.
A multiple additive regression tree analysis of three exposure measures during Hurricane Katrina.
Curtis, Andrew; Li, Bin; Marx, Brian D; Mills, Jacqueline W; Pine, John
2011-01-01
This paper analyses structural and personal exposure to Hurricane Katrina. Structural exposure is measured by flood height and building damage; personal exposure is measured by the locations of 911 calls made during the response. Using these variables, this paper characterises the geography of exposure and also demonstrates the utility of a robust analytical approach in understanding health-related challenges to disadvantaged populations during recovery. Analysis is conducted using a contemporary statistical approach, a multiple additive regression tree (MART), which displays considerable improvement over traditional regression analysis. By using MART, the percentage of improvement in R-squares over standard multiple linear regression ranges from about 62 to more than 100 per cent. The most revealing finding is the modelled verification that African Americans experienced disproportionate exposure in both structural and personal contexts. Given the impact of exposure to health outcomes, this finding has implications for understanding the long-term health challenges facing this population.
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
ERIC Educational Resources Information Center
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
Kim, Dae-Hee; Choi, Jae-Hun; Lim, Myung-Eun; Park, Soo-Jun
2008-01-01
This paper suggests the method of correcting distance between an ambient intelligence display and a user based on linear regression and smoothing method, by which distance information of a user who approaches to the display can he accurately output even in an unanticipated condition using a passive infrared VIR) sensor and an ultrasonic device. The developed system consists of an ambient intelligence display and an ultrasonic transmitter, and a sensor gateway. Each module communicates with each other through RF (Radio frequency) communication. The ambient intelligence display includes an ultrasonic receiver and a PIR sensor for motion detection. In particular, this system selects and processes algorithms such as smoothing or linear regression for current input data processing dynamically through judgment process that is determined using the previous reliable data stored in a queue. In addition, we implemented GUI software with JAVA for real time location tracking and an ambient intelligence display.
Identifying predictors of physics item difficulty: A linear regression approach
NASA Astrophysics Data System (ADS)
Mesic, Vanes; Muratovic, Hasnija
2011-06-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge
Sample Sizes when Using Multiple Linear Regression for Prediction
ERIC Educational Resources Information Center
Knofczynski, Gregory T.; Mundfrom, Daniel
2008-01-01
When using multiple regression for prediction purposes, the issue of minimum required sample size often needs to be addressed. Using a Monte Carlo simulation, models with varying numbers of independent variables were examined and minimum sample sizes were determined for multiple scenarios at each number of independent variables. The scenarios…
Interpreting Multiple Linear Regression: A Guidebook of Variable Importance
ERIC Educational Resources Information Center
Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim
2012-01-01
Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…
An introduction to using Bayesian linear regression with clinical data.
Baldwin, Scott A; Larson, Michael J
2016-12-31
Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses.
ERIC Educational Resources Information Center
Hecht, Jeffrey B.
The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…
WAVELET-BASED BAYESIAN ESTIMATION OF PARTIALLY LINEAR REGRESSION MODELSWITH LONG MEMORY ERRORS
Ko, Kyungduk; Qu, Leming; Vannucci, Marina
2013-01-01
In this paper we focus on partially linear regression models with long memory errors, and propose a wavelet-based Bayesian procedure that allows the simultaneous estimation of the model parameters and the nonparametric part of the model. Employing discrete wavelet transforms is crucial in order to simplify the dense variance-covariance matrix of the long memory error. We achieve a fully Bayesian inference by adopting a Metropolis algorithm within a Gibbs sampler. We evaluate the performances of the proposed method on simulated data. In addition, we present an application to Northern hemisphere temperature data, a benchmark in the long memory literature. PMID:23946613
NASA Astrophysics Data System (ADS)
Vesnin, V. L.; Muradov, V. G.
2012-09-01
Absorption spectra of multicomponent hydrocarbon mixtures based on n-heptane and isooctane with addition of benzene (up to 1%) and toluene and o-xylene (up to 20%) were investigated experimentally in the region of the first overtones of the hydrocarbon groups (λ = 1620-1780 nm). It was shown that their concentrations could be determined separately by using a multiple linear regression method. The optimum result was obtained by including four wavelengths at 1671, 1680, 1685, and 1695 nm, which took into account absorption of CH groups in benzene, toluene, and o-xylene and CH3 groups, respectively.
Multiple regression technique for Pth degree polynominals with and without linear cross products
NASA Technical Reports Server (NTRS)
Davis, J. W.
1973-01-01
A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
Divergent estimation error in portfolio optimization and in linear regression
NASA Astrophysics Data System (ADS)
Kondor, I.; Varga-Haszonits, I.
2008-08-01
The problem of estimation error in portfolio optimization is discussed, in the limit where the portfolio size N and the sample size T go to infinity such that their ratio is fixed. The estimation error strongly depends on the ratio N/T and diverges for a critical value of this parameter. This divergence is the manifestation of an algorithmic phase transition, it is accompanied by a number of critical phenomena, and displays universality. As the structure of a large number of multidimensional regression and modelling problems is very similar to portfolio optimization, the scope of the above observations extends far beyond finance, and covers a large number of problems in operations research, machine learning, bioinformatics, medical science, economics, and technology.
Mamouridis, Valeria; Klein, Nadja; Kneib, Thomas; Cadarso Suarez, Carmen; Maynou, Francesc
2017-01-01
We analysed the landings per unit effort (LPUE) from the Barcelona trawl fleet targeting the red shrimp (Aristeus antennatus) using novel Bayesian structured additive distributional regression to gain a better understanding of the dynamics and determinants of variation in LPUE. The data set, covering a time span of 17 years, includes fleet-dependent variables (e.g. the number of trips performed by vessels), temporal variables (inter- and intra-annual variability) and environmental variables (the North Atlantic Oscillation index). Based on structured additive distributional regression, we evaluate (i) the gain in replacing purely linear predictors by additive predictors including nonlinear effects of continuous covariates, (ii) the inclusion of vessel-specific effects based on either fixed or random effects, (iii) different types of distributions for the response, and (iv) the potential gain in not only modelling the location but also the scale/shape parameter of these distributions. Our findings support that flexible model variants are indeed able to improve the fit considerably and that additional insights can be gained. Tools to select within several model specifications and assumptions are discussed in detail as well.
Simultaneous Determination of Cobalt, Copper, and Nickel by Multivariate Linear Regression.
ERIC Educational Resources Information Center
Dado, Greg; Rosenthal, Jeffrey
1990-01-01
Presented is an experiment where the concentrations of three metal ions in a solution are simultaneously determined by ultraviolet-vis spectroscopy. Availability of the computer program used for statistically analyzing data using a multivariate linear regression is listed. (KR)
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Analysis of Covariance with Linear Regression Error Model on Antenna Control Unit Tracking
2015-10-20
412TW-PA-15238 Analysis of Covariance with Linear Regression Error Model on Antenna Control Unit Tracking DANIEL T. LAIRD AIR...COVERED (From - To) 20 OCT 15 – 23 OCT 15 4. TITLE AND SUBTITLE Analysis of Covariance with Linear Regression Error Model on Antenna Control Tracking...analysis of variance (ANOVA) to decide for the null- or alternative-hypotheses of a telemetry antenna control unit’s (ACU) ability to track on C-band
Linear Regression in High Dimension and/or for Correlated Inputs
NASA Astrophysics Data System (ADS)
Jacques, J.; Fraix-Burnet, D.
2014-12-01
Ordinary least square is the common way to estimate linear regression models. When inputs are correlated or when they are too numerous, regression methods using derived inputs directions or shrinkage methods can be efficient alternatives. Methods using derived inputs directions build new uncorrelated variables as linear combination of the initial inputs, whereas shrinkage methods introduce regularization and variable selection by penalizing the usual least square criterion. Both kinds of methods are presented and illustrated thanks to the R software on an astronomical dataset.
How to use linear regression and correlation in quantitative method comparison studies.
Twomey, P J; Kroll, M H
2008-04-01
Linear regression methods try to determine the best linear relationship between data points while correlation coefficients assess the association (as opposed to agreement) between the two methods. Linear regression and correlation play an important part in the interpretation of quantitative method comparison studies. Their major strength is that they are widely known and as a result both are employed in the vast majority of method comparison studies. While previously performed by hand, the availability of statistical packages means that regression analysis is usually performed by software packages including MS Excel, with or without the software programe Analyze-it as well as by other software packages. Such techniques need to be employed in a way that compares the agreement between the two methods examined and more importantly, because we are dealing with individual patients, whether the degree of agreement is clinically acceptable. Despite their use for many years, there is a lot of ignorance about the validity as well as the pros and cons of linear regression and correlation techniques. This review article describes the types of linear regression and regression (parametric and non-parametric methods) and the necessary general and specific requirements. The selection of the type of regression depends on where one has been trained, the tradition of the laboratory and the availability of adequate software.
Graphical Description of Johnson-Neyman Outcomes for Linear and Quadratic Regression Surfaces.
ERIC Educational Resources Information Center
Schafer, William D.; Wang, Yuh-Yin
A modification of the usual graphical representation of heterogeneous regressions is described that can aid in interpreting significant regions for linear or quadratic surfaces. The standard Johnson-Neyman graph is a bivariate plot with the criterion variable on the ordinate and the predictor variable on the abscissa. Regression surfaces are drawn…
ERIC Educational Resources Information Center
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
NASA Astrophysics Data System (ADS)
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
NASA Astrophysics Data System (ADS)
Zhu, Dazhou; Ji, Baoping; Meng, Chaoying; Shi, Bolin; Tu, Zhenhua; Qing, Zhaoshen
Hybrid linear analysis (HLA), partial least-squares (PLS) regression, and the linear least square support vector machine (LSSVM) were used to determinate the soluble solids content (SSC) of apple by Fourier transform near-infrared (FT-NIR) spectroscopy. The performance of these three linear regression methods was compared. Results showed that HLA could be used for the analysis of complex solid samples such as apple. The predictive ability of SSC model constructed by HLA was comparable to that of PLS. HLA was sensitive to outliers, thus the outliers should be eliminated before HLA calibration. Linear LSSVM performed better than PLS and HLA. Direct orthogonal signal correction (DOSC) pretreatment was effective for PLS and linear LSSVM, but not suitable for HLA. The combination of DOSC and linear LSSVM had good generalization ability and was not sensitive to outliers, so it is a promising method for linear multivariate calibration.
Predicting the occurrence of wildfires with binary structured additive regression models.
Ríos-Pena, Laura; Kneib, Thomas; Cadarso-Suárez, Carmen; Marey-Pérez, Manuel
2017-02-01
Wildfires are one of the main environmental problems facing societies today, and in the case of Galicia (north-west Spain), they are the main cause of forest destruction. This paper used binary structured additive regression (STAR) for modelling the occurrence of wildfires in Galicia. Binary STAR models are a recent contribution to the classical logistic regression and binary generalized additive models. Their main advantage lies in their flexibility for modelling non-linear effects, while simultaneously incorporating spatial and temporal variables directly, thereby making it possible to reveal possible relationships among the variables considered. The results showed that the occurrence of wildfires depends on many covariates which display variable behaviour across space and time, and which largely determine the likelihood of ignition of a fire. The joint possibility of working on spatial scales with a resolution of 1 × 1 km cells and mapping predictions in a colour range makes STAR models a useful tool for plotting and predicting wildfire occurrence. Lastly, it will facilitate the development of fire behaviour models, which can be invaluable when it comes to drawing up fire-prevention and firefighting plans.
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression. PMID:22973104
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
An improved multiple linear regression and data analysis computer program package
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Chen, Wen-Yuan; Wang, Mei; Fu, Zhou-Xing
2014-01-01
Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1) we use a terrain drop compensation (TDC) technique to solve the problem of the concavity of railway crossings; (2) we use a linear regression technique to predict the position and length of an object from image processing; (3) we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP) to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas. PMID:24936948
Chen, Wen-Yuan; Wang, Mei; Fu, Zhou-Xing
2014-06-16
Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1) we use a terrain drop compensation (TDC) technique to solve the problem of the concavity of railway crossings; (2) we use a linear regression technique to predict the position and length of an object from image processing; (3) we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP) to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas.
NASA Technical Reports Server (NTRS)
Dawson, Terence P.; Curran, Paul J.; Kupiec, John A.
1995-01-01
A major goal of airborne imaging spectrometry is to estimate the biochemical composition of vegetation canopies from reflectance spectra. Remotely-sensed estimates of foliar biochemical concentrations of forests would provide valuable indicators of ecosystem function at regional and eventually global scales. Empirical research has shown a relationship exists between the amount of radiation reflected from absorption features and the concentration of given biochemicals in leaves and canopies (Matson et al., 1994, Johnson et al., 1994). A technique commonly used to determine which wavelengths have the strongest correlation with the biochemical of interest is unguided (stepwise) multiple regression. Wavelengths are entered into a multivariate regression equation, in their order of importance, each contributing to the reduction of the variance in the measured biochemical concentration. A significant problem with the use of stepwise regression for determining the correlation between biochemical concentration and spectra is that of 'overfitting' as there are significantly more wavebands than biochemical measurements. This could result in the selection of wavebands which may be more accurately attributable to noise or canopy effects. In addition, there is a real problem of collinearity in that the individual biochemical concentrations may covary. A strong correlation between the reflectance at a given wavelength and the concentration of a biochemical of interest, therefore, may be due to the effect of another biochemical which is closely related. Furthermore, it is not always possible to account for potentially suitable waveband omissions in the stepwise selection procedure. This concern about the suitability of stepwise regression has been identified and acknowledged in a number of recent studies (Wessman et al., 1988, Curran, 1989, Curran et al., 1992, Peterson and Hubbard, 1992, Martine and Aber, 1994, Kupiec, 1994). These studies have pointed to the lack of a physical
Comparison of Linear and Non-Linear Regression Models to Estimate Leaf Area Index of Dryland Shrubs.
NASA Astrophysics Data System (ADS)
Dashti, H.; Glenn, N. F.; Ilangakoon, N. T.; Mitchell, J.; Dhakal, S.; Spaete, L.
2015-12-01
Leaf area index (LAI) is a key parameter in global ecosystem studies. LAI is considered a forcing variable in land surface processing models since ecosystem dynamics are highly correlated to LAI. In response to environmental limitations, plants in semiarid ecosystems have smaller leaf area, making accurate estimation of LAI by remote sensing a challenging issue. Optical remote sensing (400-2500 nm) techniques to estimate LAI are based either on radiative transfer models (RTMs) or statistical approaches. Considering the complex radiation field of dry ecosystems, simple 1-D RTMs lead to poor results, and on the other hand, inversion of more complex 3-D RTMs is a demanding task which requires the specification of many variables. A good alternative to physical approaches is using methods based on statistics. Similar to many natural phenomena, there is a non-linear relationship between LAI and top of canopy electromagnetic waves reflected to optical sensors. Non-linear regression models can better capture this relationship. However, considering the problem of a few numbers of observations in comparison to the feature space (n
linear models. In this study linear versus non-linear regression techniques were investigated to estimate LAI. Our study area is located in southwestern Idaho, Great Basin. Sagebrush (Artemisia tridentata spp) serves a critical role in maintaining the structure of this ecosystem. Using a leaf area meter (Accupar LP-80), LAI values were measured in the field. Linear Partial Least Square regression and non-linear, tree based Random Forest regression have been implemented to estimate the LAI of sagebrush from hyperspectral data (AVIRIS-ng) collected in late summer 2014. Cross validation of results indicate that PLS can provide comparable results to Random Forest.
Karadag, Dogan; Koc, Yunus; Turan, Mustafa; Ozturk, Mustafa
2007-06-01
Ammonium ion exchange from aqueous solution using clinoptilolite zeolite was investigated at laboratory scale. Batch experimental studies were conducted to evaluate the effect of various parameters such as pH, zeolite dosage, contact time, initial ammonium concentration and temperature. Freundlich and Langmuir isotherm models and pseudo-second-order model were fitted to experimental data. Linear and non-linear regression methods were compared to determine the best fitting of isotherm and kinetic model to experimental data. The rate limiting mechanism of ammonium uptake by zeolite was determined as chemical exchange. Non-linear regression has better performance for analyzing experimental data and Freundlich model was better than Langmuir to represent equilibrium data.
Predicting students' success at pre-university studies using linear and logistic regressions
NASA Astrophysics Data System (ADS)
Suliman, Noor Azizah; Abidin, Basir; Manan, Norhafizah Abdul; Razali, Ahmad Mahir
2014-09-01
The study is aimed to find the most suitable model that could predict the students' success at the medical pre-university studies, Centre for Foundation in Science, Languages and General Studies of Cyberjaya University College of Medical Sciences (CUCMS). The predictors under investigation were the national high school exit examination-Sijil Pelajaran Malaysia (SPM) achievements such as Biology, Chemistry, Physics, Additional Mathematics, Mathematics, English and Bahasa Malaysia results as well as gender and high school background factors. The outcomes showed that there is a significant difference in the final CGPA, Biology and Mathematics subjects at pre-university by gender factor, while by high school background also for Mathematics subject. In general, the correlation between the academic achievements at the high school and medical pre-university is moderately significant at α-level of 0.05, except for languages subjects. It was found also that logistic regression techniques gave better prediction models than the multiple linear regression technique for this data set. The developed logistic models were able to give the probability that is almost accurate with the real case. Hence, it could be used to identify successful students who are qualified to enter the CUCMS medical faculty before accepting any students to its foundation program.
High dimensional linear regression models under long memory dependence and measurement error
NASA Astrophysics Data System (ADS)
Kaul, Abhishek
This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the
Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi
2013-01-01
Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, P<0.0001; linear regression modeling, R(2) = 0.598, P<0.0001). Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.
Jalal, Hawre; Goldhaber-Fiebert, Jeremy D.; Kuntz, Karen M.
2016-01-01
Decision makers often desire both guidance on the most cost-effective interventions given current knowledge and also the value of collecting additional information to improve the decisions made [i.e., from value of information (VOI) analysis]. Unfortunately, VOI analysis remains underutilized due to the conceptual, mathematical and computational challenges of implementing Bayesian decision theoretic approaches in models of sufficient complexity for real-world decision making. In this study, we propose a novel practical approach for conducting VOI analysis using a combination of probabilistic sensitivity analysis, linear regression metamodeling, and unit normal loss integral function – a parametric approach to VOI analysis. We adopt a linear approximation and leverage a fundamental assumption of VOI analysis which requires that all sources of prior uncertainties be accurately specified. We provide examples of the approach and show that the assumptions we make do not induce substantial bias but greatly reduce the computational time needed to perform VOI analysis. Our approach avoids the need to analytically solve or approximate joint Bayesian updating, requires only one set of probabilistic sensitivity analysis simulations, and can be applied in models with correlated input parameters. PMID:25840900
A Linear Regression and Markov Chain Model for the Arabian Horse Registry
1993-04-01
as a tax deduction? Yes No T-4367 68 26. Regardless of previous equine tax deductions, do you consider your current horse activities to be... (Mark one...E L T-4367 A Linear Regression and Markov Chain Model For the Arabian Horse Registry Accesion For NTIS CRA&I UT 7 4:iC=D 5 D-IC JA" LI J:13tjlC,3 lO...the Arabian Horse Registry, which needed to forecast its future registration of purebred Arabian horses . A linear regression model was utilized to
ERIC Educational Resources Information Center
Nelson, Dean
2009-01-01
Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…
Dufrenois, F; Noyer, J C
2013-02-01
Linear discriminant analysis, such as Fisher's criterion, is a statistical learning tool traditionally devoted to separating a training dataset into two or even several classes by the way of linear decision boundaries. In this paper, we show that this tool can formalize the robust linear regression problem as a robust estimator will do. More precisely, we develop a one-class Fischer's criterion in which the maximization provides both the regression parameters and the separation of the data in two classes: typical data and atypical data or outliers. This new criterion is built on the statistical properties of the subspace decomposition of the hat matrix. From this angle, we improve the discriminative properties of the hat matrix which is traditionally used as outlier diagnostic measure in linear regression. Naturally, we call this new approach discriminative hat matrix. The proposed algorithm is fully nonsupervised and needs only the initialization of one parameter. Synthetic and real datasets are used to study the performance both in terms of regression and classification of the proposed approach. We also illustrate its potential application to image recognition and fundamental matrix estimation in computer vision.
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Calibrated Peer Review for Interpreting Linear Regression Parameters: Results from a Graduate Course
ERIC Educational Resources Information Center
Enders, Felicity B.; Jenkins, Sarah; Hoverman, Verna
2010-01-01
Biostatistics is traditionally a difficult subject for students to learn. While the mathematical aspects are challenging, it can also be demanding for students to learn the exact language to use to correctly interpret statistical results. In particular, correctly interpreting the parameters from linear regression is both a vital tool and a…
ERIC Educational Resources Information Center
Richter, Tobias
2006-01-01
Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
NASA Astrophysics Data System (ADS)
Drzewiecki, Wojciech
2016-12-01
In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique.
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Sulthana, Ayesha; Latha, K C; Imran, Mohammad; Rathan, Ramya; Sridhar, R; Balasubramanian, S
2014-01-01
Fuzzy principal component regression (FPCR) is proposed to model the non-linear process of sewage treatment plant (STP) data matrix. The dimension reduction of voluminous data was done by principal component analysis (PCA). The PCA score values were partitioned by fuzzy-c-means (FCM) clustering, and a Takagi-Sugeno-Kang (TSK) fuzzy model was built based on the FCM functions. The FPCR approach was used to predict the reduction in chemical oxygen demand (COD) and biological oxygen demand (BOD) of treated wastewater of Vidyaranyapuram STP with respect to the relations modeled between fuzzy partitioned PCA scores and target output. The designed FPCR model showed the ability to capture the behavior of non-linear processes of STP. The predicted values of reduction in COD and BOD were analyzed by performing the linear regression analysis. The predicted values for COD and BOD reduction showed positive correlation with the observed data.
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.
2013-01-01
Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839
Boosted structured additive regression for Escherichia coli fed-batch fermentation modeling.
Melcher, Michael; Scharl, Theresa; Luchner, Markus; Striedner, Gerald; Leisch, Friedrich
2017-02-01
The quality of biopharmaceuticals and patients' safety are of highest priority and there are tremendous efforts to replace empirical production process designs by knowledge-based approaches. Main challenge in this context is that real-time access to process variables related to product quality and quantity is severely limited. To date comprehensive on- and offline monitoring platforms are used to generate process data sets that allow for development of mechanistic and/or data driven models for real-time prediction of these important quantities. Ultimate goal is to implement model based feed-back control loops that facilitate online control of product quality. In this contribution, we explore structured additive regression (STAR) models in combination with boosting as a variable selection tool for modeling the cell dry mass, product concentration, and optical density on the basis of online available process variables and two-dimensional fluorescence spectroscopic data. STAR models are powerful extensions of linear models allowing for inclusion of smooth effects or interactions between predictors. Boosting constructs the final model in a stepwise manner and provides a variable importance measure via predictor selection frequencies. Our results show that the cell dry mass can be modeled with a relative error of about ±3%, the optical density with ±6%, the soluble protein with ±16%, and the insoluble product with an accuracy of ±12%. Biotechnol. Bioeng. 2017;114: 321-334. © 2016 Wiley Periodicals, Inc.
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications. PMID:27806075
Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.
Hoffman, Haydn; Lee, Sunghoon Ivan; Garst, Jordan H.; Lu, Derek S.; Li, Charles H.; Nagasawa, Daniel T.; Ghalehsari, Nima; Jahanforouz, Nima; Razaghy, Mehrdad; Espinal, Marie; Ghavamrezaii, Amir; Paak, Brian H.; Wu, Irene; Sarrafzadeh, Majid; Lu, Daniel C.
2016-01-01
This study introduces the use of multivariate linear regression (MLR) and support vector regression (SVR) models to predict postoperative outcomes in a cohort of patients who underwent surgery for cervical spondylotic myelopathy (CSM). Currently, predicting outcomes after surgery for CSM remains a challenge. We recruited patients who had a diagnosis of CSM and required decompressive surgery with or without fusion. Fine motor function was tested preoperatively and postoperatively with a handgrip-based tracking device that has been previously validated, yielding mean absolute accuracy (MAA) results for two tracking tasks (sinusoidal and step). All patients completed Oswestry disability index (ODI) and modified Japanese Orthopaedic Association questionnaires preoperatively and postoperatively. Preoperative data was utilized in MLR and SVR models to predict postoperative ODI. Predictions were compared to the actual ODI scores with the coefficient of determination (R2) and mean absolute difference (MAD). From this, 20 patients met the inclusion criteria and completed follow-up at least 3 months after surgery. With the MLR model, a combination of the preoperative ODI score, preoperative MAA (step function), and symptom duration yielded the best prediction of postoperative ODI (R2 = 0.452; MAD = 0.0887; p = 1.17 × 10−3). With the SVR model, a combination of preoperative ODI score, preoperative MAA (sinusoidal function), and symptom duration yielded the best prediction of postoperative ODI (R2 = 0.932; MAD = 0.0283; p = 5.73 × 10−12). The SVR model was more accurate than the MLR model. The SVR can be used preoperatively in risk/benefit analysis and the decision to operate. PMID:26115898
2014-09-18
ADVANCES IN SCA AND RF-DNA FINGERPRINTING THROUGH ENHANCED LINEAR REGRESSION ATTACKS AND APPLICATION OF RANDOM FOREST CLASSIFIERS DISSERTATION Hiren...SCA AND RF-DNA FINGERPRINTING THROUGH ENHANCED LINEAR REGRESSION ATTACKS AND APPLICATION OF RANDOM FOREST CLASSIFIERS DISSERTATION Presented to the...APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED AFIT-ENG-DS-14-S-03 ADVANCES IN SCA AND RF-DNA FINGERPRINTING THROUGH ENHANCED LINEAR REGRESSION ATTACKS
ERIC Educational Resources Information Center
Li, Deping; Oranje, Andreas
2007-01-01
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
SERF: A Simple, Effective, Robust, and Fast Image Super-Resolver From Cascaded Linear Regression.
Hu, Yanting; Wang, Nannan; Tao, Dacheng; Gao, Xinbo; Li, Xuelong
2016-09-01
Example learning-based image super-resolution techniques estimate a high-resolution image from a low-resolution input image by relying on high- and low-resolution image pairs. An important issue for these techniques is how to model the relationship between high- and low-resolution image patches: most existing complex models either generalize hard to diverse natural images or require a lot of time for model training, while simple models have limited representation capability. In this paper, we propose a simple, effective, robust, and fast (SERF) image super-resolver for image super-resolution. The proposed super-resolver is based on a series of linear least squares functions, namely, cascaded linear regression. It has few parameters to control the model and is thus able to robustly adapt to different image data sets and experimental settings. The linear least square functions lead to closed form solutions and therefore achieve computationally efficient implementations. To effectively decrease these gaps, we group image patches into clusters via k-means algorithm and learn a linear regressor for each cluster at each iteration. The cascaded learning process gradually decreases the gap of high-frequency detail between the estimated high-resolution image patch and the ground truth image patch and simultaneously obtains the linear regression parameters. Experimental results show that the proposed method achieves superior performance with lower time consumption than the state-of-the-art methods.
Agha, Salah R; Alnahhal, Mohammed J
2012-11-01
The current study investigates the possibility of obtaining the anthropometric dimensions, critical to school furniture design, without measuring all of them. The study first selects some anthropometric dimensions that are easy to measure. Two methods are then used to check if these easy-to-measure dimensions can predict the dimensions critical to the furniture design. These methods are multiple linear regression and neural networks. Each dimension that is deemed necessary to ergonomically design school furniture is expressed as a function of some other measured anthropometric dimensions. Results show that out of the five dimensions needed for chair design, four can be related to other dimensions that can be measured while children are standing. Therefore, the method suggested here would definitely save time and effort and avoid the difficulty of dealing with students while measuring these dimensions. In general, it was found that neural networks perform better than multiple linear regression in the current study.
User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)
Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.
2009-01-01
Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.
Genome-enabled prediction using the BLR (Bayesian Linear Regression) R-package.
de Los Campos, Gustavo; Pérez, Paulino; Vazquez, Ana I; Crossa, José
2013-01-01
The BLR (Bayesian linear regression) package of R implements several Bayesian regression models for continuous traits. The package was originally developed for implementing the Bayesian LASSO (BL) of Park and Casella (J Am Stat Assoc 103(482):681-686, 2008), extended to accommodate fixed effects and regressions on pedigree using methods described by de los Campos et al. (Genetics 182(1):375-385, 2009). In 2010 we further developed the code into an R-package, reprogrammed some internal aspects of the algorithm in the C language to increase computational speed, and further documented the package (Plant Genome J 3(2):106-116, 2010). The first version of BLR was launched in 2010 and since then the package has been used for multiple publications and is being routinely used for genomic evaluations in some animal and plant breeding programs. In this article we review the models implemented by BLR and illustrate the use of the package with examples.
Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J
2015-08-01
Regression-based prosthesis control using surface electromyography (EMG) has demonstrated real-time simultaneous control of multiple degrees of freedom (DOFs) in transradial amputees. However, these systems have been limited to control of wrist DOFs. Use of intramuscular EMG has shown promise for both wrist and hand control in able-bodied subjects, but to date has not been evaluated in amputee subjects. The objective of this study was to evaluate two regression-based simultaneous control methods using intramuscular EMG in transradial amputees and compare their performance to able-bodied subjects. Two transradial amputees and sixteen able-bodied subjects used fine wire EMG recorded from six forearm muscles to control three wrist/hand DOFs: wrist rotation, wrist flexion/extension, and hand open/close. Both linear regression and probability-weighted regression systems were evaluated in a virtual Fitts' Law test. Though both amputee subjects initially produced worse performance metrics than the able-bodied subjects, the amputee subject who completed multiple experimental blocks of the Fitts' law task demonstrated substantial learning. This subject's performance was within the range of able-bodied subjects by the end of the experiment. Both amputee subjects also showed improved performance when using probability-weighted regression for targets requiring use of only one DOF, and mirrored statistically significant differences observed with able-bodied subjects. These results indicate that amputee subjects may require more learning to achieve similar performance metrics as able-bodied subjects. These results also demonstrate that comparative findings between linear and probability-weighted regression with able-bodied subjects reflect performance differences when used by the amputee population.
Kumar, K Vasanth; Porkodi, K; Rocha, F
2008-01-15
A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of basic red 9 sorption by activated carbon. The r(2) was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions namely coefficient of determination (r(2)), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), the average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. Non-linear regression was found to be a better way to obtain the parameters involved in the isotherms and also the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r(2) was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K(2) was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm.
Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.
Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo
2015-08-01
Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.
Kumar, K Vasanth; Porkodi, K; Rocha, F
2008-03-01
A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of methylene blue sorption by activated carbon. The r2 was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions, namely coefficient of determination (r2), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r2 was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K2 was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm.
Zainudin, Suhaila; Arif, Shereena M.
2017-01-01
Gene regulatory network (GRN) reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR) to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C) as a direct interaction (A → C). Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5. PMID:28250767
Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models.
Elliott, Michael R
2009-03-01
In sample surveys where units have unequal probabilities of inclusion, associations between the inclusion probability and the statistic of interest can induce bias in unweighted estimates. This is true even in regression models, where the estimates of the population slope may be biased if the underlying mean model is misspecified or the sampling is nonignorable. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have highly variable weights; weight trimming reduces large weights to a maximum value, reducing variability but introducing bias. Most standard approaches are ad hoc in that they do not use the data to optimize bias-variance trade-offs. This article uses Bayesian model averaging to create "data driven" weight trimming estimators. We extend previous results for linear regression models (Elliott 2008) to generalized linear regression models, developing robust models that approximate fully-weighted estimators when bias correction is of greatest importance, and approximate unweighted estimators when variance reduction is critical.
Distributed Monitoring of the R(sup 2) Statistic for Linear Regression
NASA Technical Reports Server (NTRS)
Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.
2011-01-01
The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
Research on the multiple linear regression in non-invasive blood glucose measurement.
Zhu, Jianming; Chen, Zhencheng
2015-01-01
A non-invasive blood glucose measurement sensor and the data process algorithm based on the metabolic energy conservation (MEC) method are presented in this paper. The physiological parameters of human fingertip can be measured by various sensing modalities, and blood glucose value can be evaluated with the physiological parameters by the multiple linear regression analysis. Five methods such as enter, remove, forward, backward and stepwise in multiple linear regression were compared, and the backward method had the best performance. The best correlation coefficient was 0.876 with the standard error of the estimate 0.534, and the significance was 0.012 (sig. <0.05), which indicated the regression equation was valid. The Clarke error grid analysis was performed to compare the MEC method with the hexokinase method, using 200 data points. The correlation coefficient R was 0.867 and all of the points were located in Zone A and Zone B, which shows the MEC method provides a feasible and valid way for non-invasive blood glucose measurement.
A note on the use of multiple linear regression in molecular ecology.
Frasier, Timothy R
2016-03-01
Multiple linear regression analyses (also often referred to as generalized linear models--GLMs, or generalized linear mixed models--GLMMs) are widely used in the analysis of data in molecular ecology, often to assess the relative effects of genetic characteristics on individual fitness or traits, or how environmental characteristics influence patterns of genetic differentiation. However, the coefficients resulting from multiple regression analyses are sometimes misinterpreted, which can lead to incorrect interpretations and conclusions within individual studies, and can propagate to wider-spread errors in the general understanding of a topic. The primary issue revolves around the interpretation of coefficients for independent variables when interaction terms are also included in the analyses. In this scenario, the coefficients associated with each independent variable are often interpreted as the independent effect of each predictor variable on the predicted variable. However, this interpretation is incorrect. The correct interpretation is that these coefficients represent the effect of each predictor variable on the predicted variable when all other predictor variables are zero. This difference may sound subtle, but the ramifications cannot be overstated. Here, my goals are to raise awareness of this issue, to demonstrate and emphasize the problems that can result and to provide alternative approaches for obtaining the desired information.
An evaluation of bias in propensity score-adjusted non-linear regression models.
Wan, Fei; Mitra, Nandita
2016-04-19
Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.
Li, Yanming; Zhu, Ji
2015-01-01
Summary We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functioning groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. PMID:25732839
Single Image Super-Resolution Using Global Regression Based on Multiple Local Linear Mappings.
Choi, Jae-Seok; Kim, Munchurl
2017-03-01
Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity. However, since SI only utilizes simple linear mappings, it may fail to precisely reconstruct HR patches with complex texture. In this paper, we present a novel SR method, which inherits the large-to-small patch conversion scheme from SI but uses global regression based on local linear mappings (GLM). Thus, our new SR method is called GLM-SI. In GLM-SI, each LR input patch is divided into 25 overlapped subpatches. Next, based on the local properties of these subpatches, 25 different local linear mappings are applied to the current LR input patch to generate 25 HR patch candidates, which are then regressed into one final HR patch using a global regressor. The local linear mappings are learned cluster-wise in our off-line training phase. The main contribution of this paper is as follows: Previously, linear-mapping-based conventional SR methods, including SI only used one simple yet coarse linear mapping to each patch to reconstruct its HR version. On the contrary, for each LR input patch, our GLM-SI is the first to apply a combination of multiple local linear mappings, where each local linear mapping is found according to local properties of the current LR patch. Therefore, it can better approximate nonlinear LR-to-HR mappings for HR patches with complex texture. Experiment results show that the proposed GLM-SI method outperforms most of the state-of-the-art methods, and shows comparable PSNR performance with much lower
Describing Adequacy of cure with maximum hardness ratios and non-linear regression.
Bouschlicher, Murray; Berning, Kristen; Qian, Fang
2008-01-01
Knoop Hardness (KH) ratios (HR) > or = 80% are commonly used as criteria for the adequate cure of a composite. These per-specimen HRs can be misleading, as both numerator and denominator may increase concurrently, prior to reaching an asymptotic, top-surface maximum hardness value (H(MAX)). Extended cure times were used to establish H(MAX) and descriptive statistics, and non-linear regression analysis were used to describe the relationship between exposure duration and HR and predict the time required for HR-H(MAX) = 80%. Composite samples 2.00 x 5.00 mm diameter (n = 5/grp) were cured for 10 seconds, 20 seconds, 40 seconds, 60 seconds, 90 seconds, 120 seconds, 180 seconds and 240 seconds in a 2-composite x 2-light curing unit design. A microhybrid (Point 4, P4) or microfill resin (Heliomolar, HM) composite was cured with a QTH or LED light curing unit and then stored in the dark for 24 hours prior to KH testing. Non-linear regression was calculated with: H = (H(MAX)-c)(1-e(-kt)) +c, H(MAX) = maximum hardness (a theoretical asymptotic value), c = constant (t = 0), k = rate constant and t = exposure duration describes the relationship between radiant exposure (irradiance x time) and HRs. Exposure durations for HR-H(MAX) = 80% were calculated. Two-sample t-tests for pairwise comparisons evaluated relative performance of the light curing units for similar surface x composite x exposure (10-90s). A good measure of goodness-of-fit of the non-linear regression, r2, ranged from 0.68-0.95. (mean = 0.82). Microhybrid (P4) exposure to achieve HR-H(MAX = 80% was 21 seconds for QTH and 34 seconds for the LED light curing unit. Corresponding values for microfill (HM) were 71 and 74 seconds, respectively. P4 HR-H(MAX) of LED vs QTH was statistically similar for 10 to 40 seconds, while HM HR-H(MAX) of LED was significantly lower than QTH for 10 to 40 seconds. It was concluded that redefined hardness ratios based on maximum hardness used in conjunction with non-linear regression
An Empirical Likelihood Method for Semiparametric Linear Regression with Right Censored Data
Fang, Kai-Tai; Li, Gang; Lu, Xuyang; Qin, Hong
2013-01-01
This paper develops a new empirical likelihood method for semiparametric linear regression with a completely unknown error distribution and right censored survival data. The method is based on the Buckley-James (1979) estimating equation. It inherits some appealing properties of the complete data empirical likelihood method. For example, it does not require variance estimation which is problematic for the Buckley-James estimator. We also extend our method to incorporate auxiliary information. We compare our method with the synthetic data empirical likelihood of Li and Wang (2003) using simulations. We also illustrate our method using Stanford heart transplantation data. PMID:23573169
NASA Technical Reports Server (NTRS)
Barrett, C. A.
1985-01-01
Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.
Antelis, Javier M; Montesano, Luis; Ramos-Murguialday, Ander; Birbaumer, Niels; Minguez, Javier
2013-01-01
Several works have reported on the reconstruction of 2D/3D limb kinematics from low-frequency EEG signals using linear regression models based on positive correlation values between the recorded and the reconstructed trajectories. This paper describes the mathematical properties of the linear model and the correlation evaluation metric that may lead to a misinterpretation of the results of this type of decoders. Firstly, the use of a linear regression model to adjust the two temporal signals (EEG and velocity profiles) implies that the relevant component of the signal used for decoding (EEG) has to be in the same frequency range as the signal to be decoded (velocity profiles). Secondly, the use of a correlation to evaluate the fitting of two trajectories could lead to overly-optimistic results as this metric is invariant to scale. Also, the correlation has a non-linear nature that leads to higher values for sinus/cosinus-like signals at low frequencies. Analysis of these properties on the reconstruction results was carried out through an experiment performed in line with previous studies, where healthy participants executed predefined reaching movements of the hand in 3D space. While the correlations of limb velocity profiles reconstructed from low-frequency EEG were comparable to studies in this domain, a systematic statistical analysis revealed that these results were not above the chance level. The empirical chance level was estimated using random assignments of recorded velocity profiles and EEG signals, as well as combinations of randomly generated synthetic EEG with recorded velocity profiles and recorded EEG with randomly generated synthetic velocity profiles. The analysis shows that the positive correlation results in this experiment cannot be used as an indicator of successful trajectory reconstruction based on a neural correlate. Several directions are herein discussed to address the misinterpretation of results as well as the implications on previous
da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues
2015-01-01
This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil.
NASA Astrophysics Data System (ADS)
Setyaningsih, S.
2017-01-01
The main element to build a leading university requires lecturer commitment in a professional manner. Commitment is measured through willpower, loyalty, pride, loyalty, and integrity as a professional lecturer. A total of 135 from 337 university lecturers were sampled to collect data. Data were analyzed using validity and reliability test and multiple linear regression. Many studies have found a link on the commitment of lecturers, but the basic cause of the causal relationship is generally neglected. These results indicate that the professional commitment of lecturers affected by variables empowerment, academic culture, and trust. The relationship model between variables is composed of three substructures. The first substructure consists of endogenous variables professional commitment and exogenous three variables, namely the academic culture, empowerment and trust, as well as residue variable ɛ y . The second substructure consists of one endogenous variable that is trust and two exogenous variables, namely empowerment and academic culture and the residue variable ɛ 3. The third substructure consists of one endogenous variable, namely the academic culture and exogenous variables, namely empowerment as well as residue variable ɛ 2. Multiple linear regression was used in the path model for each substructure. The results showed that the hypothesis has been proved and these findings provide empirical evidence that increasing the variables will have an impact on increasing the professional commitment of the lecturers.
Pagowski, M O; Grell, G A; Devenyi, D; Peckham, S E; McKeen, S A; Gong, W; Monache, L D; McHenry, J N; McQueen, J; Lee, P
2006-02-02
Forecasts from seven air quality models and surface ozone data collected over the eastern USA and southern Canada during July and August 2004 provide a unique opportunity to assess benefits of ensemble-based ozone forecasting and devise methods to improve ozone forecasts. In this investigation, past forecasts from the ensemble of models and hourly surface ozone measurements at over 350 sites are used to issue deterministic 24-h forecasts using a method based on dynamic linear regression. Forecasts of hourly ozone concentrations as well as maximum daily 8-h and 1-h averaged concentrations are considered. It is shown that the forecasts issued with the application of this method have reduced bias and root mean square error and better overall performance scores than any of the ensemble members and the ensemble average. Performance of the method is similar to another method based on linear regression described previously by Pagowski et al., but unlike the latter, the current method does not require measurements from multiple monitors since it operates on individual time series. Improvement in the forecasts can be easily implemented and requires minimal computational cost.
NASA Astrophysics Data System (ADS)
Deglint, Jason; Kazemzadeh, Farnoud; Wong, Alexander; Clausi, David A.
2015-09-01
One method to acquire multispectral images is to sequentially capture a series of images where each image contains information from a different bandwidth of light. Another method is to use a series of beamsplitters and dichroic filters to guide different bandwidths of light onto different cameras. However, these methods are very time consuming and expensive and perform poorly in dynamic scenes or when observing transient phenomena. An alternative strategy to capturing multispectral data is to infer this data using sparse spectral reflectance measurements captured using an imaging device with overlapping bandpass filters, such as a consumer digital camera using a Bayer filter pattern. Currently the only method of inferring dense reflectance spectra is the Wiener adaptive filter, which makes Gaussian assumptions about the data. However, these assumptions may not always hold true for all data. We propose a new technique to infer dense reflectance spectra from sparse spectral measurements through the use of a non-linear regression model. The non-linear regression model used in this technique is the random forest model, which is an ensemble of decision trees and trained via the spectral characterization of the optical imaging system and spectral data pair generation. This model is then evaluated by spectrally characterizing different patches on the Macbeth color chart, as well as by reconstructing inferred multispectral images. Results show that the proposed technique can produce inferred dense reflectance spectra that correlate well with the true dense reflectance spectra, which illustrates the merits of the technique.
Aboveground biomass and carbon stocks modelling using non-linear regression model
NASA Astrophysics Data System (ADS)
Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd
2016-06-01
Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.
Yoneoka, Daisuke; Henmi, Masayuki
2016-12-16
Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd.
Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions
NASA Astrophysics Data System (ADS)
Drouard, Vincent; Horaud, Radu; Deleforge, Antoine; Ba, Sileye; Evangelidis, Georgios
2017-03-01
Head-pose estimation has many applications, such as social event analysis, human-robot and human-computer interaction, driving assistance, and so forth. Head-pose estimation is challenging because it must cope with changing illumination conditions, variabilities in face orientation and in appearance, partial occlusions of facial landmarks, as well as bounding-box-to-face alignment errors. We propose tu use a mixture of linear regressions with partially-latent output. This regression method learns to map high-dimensional feature vectors (extracted from bounding boxes of faces) onto the joint space of head-pose angles and bounding-box shifts, such that they are robustly predicted in the presence of unobservable phenomena. We describe in detail the mapping method that combines the merits of unsupervised manifold learning techniques and of mixtures of regressions. We validate our method with three publicly available datasets and we thoroughly benchmark four variants of the proposed algorithm with several state-of-the-art head-pose estimation methods.
Efficient least angle regression for identification of linear-in-the-parameters models.
Zhao, Wanqing; Beach, Thomas H; Rezgui, Yacine
2017-02-01
Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm.
Uyak, Vedat; Ozdemir, Kadir; Toroz, Ismail
2007-06-01
Oxidation of raw water with chlorine results in formation of trihalomethanes (THM) and haloacetic acids (HAA). Factors affecting their concentrations have been found to be organic matter type and concentration, pH, temperature, chlorine dose, contact time and bromide concentration, but the mechanisms of their formation are still under investigation. Within this scope, chlorination experiments have been conducted with water reservoirs from Terkos, Buyukcekmece and Omerli lakes, Istanbul, with different water quality regarding bromide concentration and organic matter content. The factors studied were pH, contact time, chlorine dose, and specific ultraviolet absorbance (SUVA). The determination of disinfection by-products (DBP) was carried out by gas chromatography techniques. Statistical analysis of the results was focused on the development of multiple regression models for predicting the concentrations of total THM and total HAA based on the use of pH, contact time, chlorine dose, and SUVA. The developed models provided satisfactory estimations of the concentrations of the DBP and the model regression coefficients of THM and HAA are 0.88 and 0.61, respectively. Further, the Durbin-Watson values confirm the reliability of the two models. The results indicate that under these experimental conditions which indicate the variations of pH, chlorine dosages, contact time, and SUVA values, the formation of THM and HAA in water can be described by the multiple linear regression technique.
Multivariate linear regression of high-dimensional fMRI data with multiple target variables.
Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia
2014-05-01
Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets.
NASA Astrophysics Data System (ADS)
Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei
2014-10-01
Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.
ERIC Educational Resources Information Center
Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.
2013-01-01
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Ghazali, Nurul Adyani; Ramli, Nor Azam; Yahaya, Ahmad Shukri; Yusof, Noor Faizah Fitri M D; Sansuddin, Nurulilyana; Al Madhoun, Wesam Ahmed
2010-06-01
Analysis and forecasting of air quality parameters are important topics of atmospheric and environmental research today due to the health impact caused by air pollution. This study examines transformation of nitrogen dioxide (NO(2)) into ozone (O(3)) at urban environment using time series plot. Data on the concentration of environmental pollutants and meteorological variables were employed to predict the concentration of O(3) in the atmosphere. Possibility of employing multiple linear regression models as a tool for prediction of O(3) concentration was tested. Results indicated that the presence of NO(2) and sunshine influence the concentration of O(3) in Malaysia. The influence of the previous hour ozone on the next hour concentrations was also demonstrated.
Chicken barn climate and hazardous volatile compounds control using simple linear regression and PID
NASA Astrophysics Data System (ADS)
Abdullah, A. H.; Bakar, M. A. A.; Shukor, S. A. A.; Saad, F. S. A.; Kamis, M. S.; Mustafa, M. H.; Khalid, N. S.
2016-07-01
The hazardous volatile compounds from chicken manure in chicken barn are potentially to be a health threat to the farm animals and workers. Ammonia (NH3) and hydrogen sulphide (H2S) produced in chicken barn are influenced by climate changes. The Electronic Nose (e-nose) is used for the barn's air, temperature and humidity data sampling. Simple Linear Regression is used to identify the correlation between temperature-humidity, humidity-ammonia and ammonia-hydrogen sulphide. MATLAB Simulink software was used for the sample data analysis using PID controller. Results shows that the performance of PID controller using the Ziegler-Nichols technique can improve the system controller to control climate in chicken barn.
NASA Astrophysics Data System (ADS)
Tai, Shen-Chuan; Chen, Peng-Yu; Chao, Chian-Yen
2016-07-01
The Consultative Committee for Space Data Systems proposed an efficient image compression standard that can do lossless compression (CCSDS-ICS). CCSDS-ICS is the most widely utilized standard for satellite communications. However, the original CCSDS-ICS is weak in terms of error resilience with even a single incorrect bit possibly causing numerous missing pixels. A restoration algorithm based on the neighborhood similar pixel interpolator is proposed to fill in missing pixels. The linear regression model is used to generate the reference image from other panchromatic or multispectral images. Furthermore, an adaptive search window is utilized to sieve out similar pixels from the pixels in the search region defined in the neighborhood similar pixel interpolator. The experimental results show that the proposed methods are capable of reconstructing missing regions with good visual quality.
Yoo, Yun Joo; Sun, Lei; Poirier, Julia G.; Paterson, Andrew D.
2016-01-01
ABSTRACT By jointly analyzing multiple variants within a gene, instead of one at a time, gene‐based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster‐specific effects in a quadratic sum of squares and cross‐products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well‐powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P‐value, variance‐component, and principal‐component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene‐specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome‐wide analysis. The cluster construction of the MLC test statistics helps reveal within‐gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations. PMID:27885705
Multiple linear and principal component regressions for modelling ecotoxicity bioassay response.
Gomes, Ana I; Pires, José C M; Figueiredo, Sónia A; Boaventura, Rui A R
2014-01-01
The ecotoxicological response of the living organisms in an aquatic system depends on the physical, chemical and bacteriological variables, as well as the interactions between them. An important challenge to scientists is to understand the interaction and behaviour of factors involved in a multidimensional process such as the ecotoxicological response. With this aim, multiple linear regression (MLR) and principal component regression were applied to the ecotoxicity bioassay response of Chlorella vulgaris and Vibrio fischeri in water collected at seven sites of Leça river during five monitoring campaigns (February, May, June, August and September of 2006). The river water characterization included the analysis of 22 physicochemical and 3 microbiological parameters. The model that best fitted the data was MLR, which shows: (i) a negative correlation with dissolved organic carbon, zinc and manganese, and a positive one with turbidity and arsenic, regarding C. vulgaris toxic response; (ii) a negative correlation with conductivity and turbidity and a positive one with phosphorus, hardness, iron, mercury, arsenic and faecal coliforms, concerning V. fischeri toxic response. This integrated assessment may allow the evaluation of the effect of future pollution abatement measures over the water quality of Leça River.
Profile local linear estimation of generalized semiparametric regression model for longitudinal data
Sun, Liuquan; Zhou, Jie
2013-01-01
This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A K -fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example. PMID:23471814
The overlooked potential of Generalized Linear Models in astronomy, I: Binomial regression
NASA Astrophysics Data System (ADS)
de Souza, R. S.; Cameron, E.; Killedar, M.; Hilbe, J.; Vilalta, R.; Maio, U.; Biffi, V.; Ciardi, B.; Riggs, J. D.
2015-09-01
Revealing hidden patterns in astronomical data is often the path to fundamental scientific breakthroughs; meanwhile the complexity of scientific enquiry increases as more subtle relationships are sought. Contemporary data analysis problems often elude the capabilities of classical statistical techniques, suggesting the use of cutting edge statistical methods. In this light, astronomers have overlooked a whole family of statistical techniques for exploratory data analysis and robust regression, the so-called Generalized Linear Models (GLMs). In this paper-the first in a series aimed at illustrating the power of these methods in astronomical applications-we elucidate the potential of a particular class of GLMs for handling binary/binomial data, the so-called logit and probit regression techniques, from both a maximum likelihood and a Bayesian perspective. As a case in point, we present the use of these GLMs to explore the conditions of star formation activity and metal enrichment in primordial minihaloes from cosmological hydro-simulations including detailed chemistry, gas physics, and stellar feedback. We predict that for a dark mini-halo with metallicity ≈ 1.3 × 10-4Z⨀, an increase of 1.2 × 10-2 in the gas molecular fraction, increases the probability of star formation occurrence by a factor of 75%. Finally, we highlight the use of receiver operating characteristic curves as a diagnostic for binary classifiers, and ultimately we use these to demonstrate the competitive predictive performance of GLMs against the popular technique of artificial neural networks.
NASA Astrophysics Data System (ADS)
Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.
2016-02-01
The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.
NASA Astrophysics Data System (ADS)
Li, Guofa; Huang, Wei; Zheng, Hao; Zhang, Baoqing
2016-02-01
The spectral ratio method (SRM) is widely used to estimate quality factor Q via the linear regression of seismic attenuation under the assumption of a constant Q. However, the estimate error will be introduced when this assumption is violated. For the frequency-dependent Q described by a power-law function, we derived the analytical expression of estimate error as a function of the power-law exponent γ and the ratio of the bandwidth to the central frequency σ . Based on the theoretical analysis, we found that the estimate errors are mainly dominated by the exponent γ , and less affected by the ratio σ . This phenomenon implies that the accuracy of the Q estimate can hardly be improved by adjusting the width and range of the frequency band. Hence, we proposed a two-parameter regression method to estimate the frequency-dependent Q from the nonlinear seismic attenuation. The proposed method was tested using the direct waves acquired by a near-surface cross-hole survey, and its reliability was evaluated in comparison with the result of SRM.
2014-01-01
This paper examined the efficiency of multivariate linear regression (MLR) and artificial neural network (ANN) models in prediction of two major water quality parameters in a wastewater treatment plant. Biochemical oxygen demand (BOD) and chemical oxygen demand (COD) as well as indirect indicators of organic matters are representative parameters for sewer water quality. Performance of the ANN models was evaluated using coefficient of correlation (r), root mean square error (RMSE) and bias values. The computed values of BOD and COD by model, ANN method and regression analysis were in close agreement with their respective measured values. Results showed that the ANN performance model was better than the MLR model. Comparative indices of the optimized ANN with input values of temperature (T), pH, total suspended solid (TSS) and total suspended (TS) for prediction of BOD was RMSE = 25.1 mg/L, r = 0.83 and for prediction of COD was RMSE = 49.4 mg/L, r = 0.81. It was found that the ANN model could be employed successfully in estimating the BOD and COD in the inlet of wastewater biochemical treatment plants. Moreover, sensitive examination results showed that pH parameter have more effect on BOD and COD predicting to another parameters. Also, both implemented models have predicted BOD better than COD. PMID:24456676
Marginal regression approach for additive hazards models with clustered current status data.
Su, Pei-Fang; Chi, Yunchan
2014-01-15
Current status data arise naturally from tumorigenicity experiments, epidemiology studies, biomedicine, econometrics and demographic and sociology studies. Moreover, clustered current status data may occur with animals from the same litter in tumorigenicity experiments or with subjects from the same family in epidemiology studies. Because the only information extracted from current status data is whether the survival times are before or after the monitoring or censoring times, the nonparametric maximum likelihood estimator of survival function converges at a rate of n(1/3) to a complicated limiting distribution. Hence, semiparametric regression models such as the additive hazards model have been extended for independent current status data to derive the test statistics, whose distributions converge at a rate of n(1/2) , for testing the regression parameters. However, a straightforward application of these statistical methods to clustered current status data is not appropriate because intracluster correlation needs to be taken into account. Therefore, this paper proposes two estimating functions for estimating the parameters in the additive hazards model for clustered current status data. The comparative results from simulation studies are presented, and the application of the proposed estimating functions to one real data set is illustrated.
Lin, Feng-Chang; Zhu, Jun
2012-01-01
We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.
Linear regression calibration: theoretical framework and empirical results in EPIC, Germany.
Kynast-Wolf, Gisela; Becker, Nikolaus; Kroke, Anja; Brandstetter, Birgit R; Wahrendorf, Jürgen; Boeing, Heiner
2002-01-01
Large scale dietary assessment instruments are usually based on the food frequency technique and have therefore to be tailored to the involved populations with respect to mode of application and inquired food items. In multicenter studies with different populations, the direct comparability of dietary data is therefore a challenge because each local dietary assessment tool might have its specific measurement error. Thus, for risk analysis the direct use of dietary measurements across centers requires a common reference. For example, in the European prospective cohort study EPIC (European Prospective Investigation into Cancer and Nutrition) a 24-hour recall was chosen to serve as such a reference instrument which was based on a highly standardized computer-assisted interview (EPIC-SOFT). The 24-hour recall was applied to a representative subset of EPIC participants in all centers. The theoretical framework of combining multicenter dietary information was previously published in several papers and is called linear regression calibration. It is based on a linear regression of the food frequency questionnaire to the reference. The regression coefficients describe the absolute and proportional scaling bias of the questionnaire with the 24-hour recall taken as reference. This article describes the statistical basis of the calibration approach and presents first empirical results of its application to fruit, cereals and meat consumption in EPIC Germany represented by the two EPIC centers, Heidelberg and Potsdam. It was found that fruit could be measured well by the questionnaire in both centers (lambdacirc; = 0.98 (males) and lambdacirc; = 0.95 (females) in Heidelberg, and lambdacirc; = 0.86 (males) and lambdacirc; = 0.7 (females) in Potsdam), cereals less (lambdacirc; = 0.53 (males) and lambdacirc; = 0.4 (females) in Heidelberg, and lambdacirc; = 0.53 (males) and lambdacirc; = 0.44 (females) in Potsdam), and that the assessment of meat (lambdacirc; = 0.72 (males) and
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure.
Fisz, Jacek J
2006-12-07
The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi-linear
Fernández-Fernández, Mario; Rodríguez-González, Pablo; García Alonso, J Ignacio
2016-10-01
We have developed a novel, rapid and easy calculation procedure for Mass Isotopomer Distribution Analysis based on multiple linear regression which allows the simultaneous calculation of the precursor pool enrichment and the fraction of newly synthesized labelled proteins (fractional synthesis) using linear algebra. To test this approach, we used the peptide RGGGLK as a model tryptic peptide containing three subunits of glycine. We selected glycine labelled in two (13) C atoms ((13) C2 -glycine) as labelled amino acid to demonstrate that spectral overlap is not a problem in the proposed methodology. The developed methodology was tested first in vitro by changing the precursor pool enrichment from 10 to 40% of (13) C2 -glycine. Secondly, a simulated in vivo synthesis of proteins was designed by combining the natural abundance RGGGLK peptide and 10 or 20% (13) C2 -glycine at 1 : 1, 1 : 3 and 3 : 1 ratios. Precursor pool enrichments and fractional synthesis values were calculated with satisfactory precision and accuracy using a simple spreadsheet. This novel approach can provide a relatively rapid and easy means to measure protein turnover based on stable isotope tracers. Copyright © 2016 John Wiley & Sons, Ltd.
Structured penalties for functional linear models—partially empirical eigenvectors for regression
Randolph, Timothy W.; Harezlak, Jaroslaw; Feng, Ziding
2012-01-01
One of the challenges with functional data is incorporating geometric structure, or local correlation, into the analysis. This structure is inherent in the output from an increasing number of biomedical technologies, and a functional linear model is often used to estimate the relationship between the predictor functions and scalar responses. Common approaches to the problem of estimating a coefficient function typically involve two stages: regularization and estimation. Regularization is usually done via dimension reduction, projecting onto a predefined span of basis functions or a reduced set of eigenvectors (principal components). In contrast, we present a unified approach that directly incorporates geometric structure into the estimation process by exploiting the joint eigenproperties of the predictors and a linear penalty operator. In this sense, the components in the regression are ‘partially empirical’ and the framework is provided by the generalized singular value decomposition (GSVD). The form of the penalized estimation is not new, but the GSVD clarifies the process and informs the choice of penalty by making explicit the joint influence of the penalty and predictors on the bias, variance and performance of the estimated coefficient function. Laboratory spectroscopy data and simulations are used to illustrate the concepts. PMID:22639702
NASA Astrophysics Data System (ADS)
Shortridge, J.; Guikema, S.; Zaitchik, B. F.
2015-12-01
In the past decade, machine-learning methods for empirical rainfall-runoff modeling have seen extensive development. However, the majority of research has focused on a small number of methods, such as artificial neural networks, while not considering other approaches for non-parametric regression that have been developed in recent years. These methods may be able to achieve comparable predictive accuracy to ANN's and more easily provide physical insights into the system of interest through evaluation of covariate influence. Additionally, these methods could provide a straightforward, computationally efficient way of evaluating climate change impacts in basins where data to support physical hydrologic models is limited. In this paper, we use multiple regression and machine-learning approaches to predict monthly streamflow in five highly-seasonal rivers in the highlands of Ethiopia. We find that generalized additive models, random forests, and cubist models achieve better predictive accuracy than ANNs in many basins assessed and are also able to outperform physical models developed for the same region. We discuss some challenges that could hinder the use of such models for climate impact assessment, such as biases resulting from model formulation and prediction under extreme climate conditions, and suggest methods for preventing and addressing these challenges. Finally, we demonstrate how predictor variable influence can be assessed to provide insights into the physical functioning of data-sparse watersheds.
Regression analysis of mixed recurrent-event and panel-count data with additive rate models.
Zhu, Liang; Zhao, Hui; Sun, Jianguo; Leisenring, Wendy; Robison, Leslie L
2015-03-01
Event-history studies of recurrent events are often conducted in fields such as demography, epidemiology, medicine, and social sciences (Cook and Lawless, 2007, The Statistical Analysis of Recurrent Events. New York: Springer-Verlag; Zhao et al., 2011, Test 20, 1-42). For such analysis, two types of data have been extensively investigated: recurrent-event data and panel-count data. However, in practice, one may face a third type of data, mixed recurrent-event and panel-count data or mixed event-history data. Such data occur if some study subjects are monitored or observed continuously and thus provide recurrent-event data, while the others are observed only at discrete times and hence give only panel-count data. A more general situation is that each subject is observed continuously over certain time periods but only at discrete times over other time periods. There exists little literature on the analysis of such mixed data except that published by Zhu et al. (2013, Statistics in Medicine 32, 1954-1963). In this article, we consider the regression analysis of mixed data using the additive rate model and develop some estimating equation-based approaches to estimate the regression parameters of interest. Both finite sample and asymptotic properties of the resulting estimators are established, and the numerical studies suggest that the proposed methodology works well for practical situations. The approach is applied to a Childhood Cancer Survivor Study that motivated this study.
NASA Astrophysics Data System (ADS)
Gangopadhyay, S.; Clark, M. P.; Rajagopalan, B.
2002-12-01
The success of short term (days to fortnight) streamflow forecasting largely depends on the skill of surface climate (e.g., precipitation and temperature) forecasts at local scales in the individual river basins. The surface climate forecasts are used to drive the hydrologic models for streamflow forecasting. Typically, Medium Range Forecast (MRF) models provide forecasts of large scale circulation variables (e.g. pressures, wind speed, relative humidity etc.) at different levels in the atmosphere on a regular grid - which are then used to "downscale" to the surface climate at locations within the model grid box. Several statistical and dynamical methods are available for downscaling. This paper compares the utility of two statistical downscaling methodologies: (1) multiple linear regression (MLR) and (2) a nonparametric approach based on k-nearest neighbor (k-NN) bootstrap method, in providing local-scale information of precipitation and temperature at a network of stations in the Upper Colorado River Basin. Downscaling to the stations is based on output of large scale circulation variables (i.e. predictors) from the NCEP Medium Range Forecast (MRF) database. Fourteen-day six hourly forecasts are developed using these two approaches, and their forecast skill evaluated. A stepwise regression is performed at each location to select the predictors for the MLR. The k-NN bootstrap technique resamples historical data based on their "nearness" to the current pattern in the predictor space. Prior to resampling a Principal Component Analysis (PCA) is performed on the predictor set to identify a small subset of predictors. Preliminary results using the MLR technique indicate a significant value in the downscaled MRF output in predicting runoff in the Upper Colorado Basin. It is expected that the k-NN approach will match the skill of the MLR approach at individual stations, and will have the added advantage of preserving the spatial co-variability between stations, capturing
Monopole and dipole estimation for multi-frequency sky maps by linear regression
NASA Astrophysics Data System (ADS)
Wehus, I. K.; Fuskeland, U.; Eriksen, H. K.; Banday, A. J.; Dickinson, C.; Ghosh, T.; Górski, K. M.; Lawrence, C. R.; Leahy, J. P.; Maino, D.; Reich, P.; Reich, W.
2017-01-01
We describe a simple but efficient method for deriving a consistent set of monopole and dipole corrections for multi-frequency sky map data sets, allowing robust parametric component separation with the same data set. The computational core of this method is linear regression between pairs of frequency maps, often called T-T plots. Individual contributions from monopole and dipole terms are determined by performing the regression locally in patches on the sky, while the degeneracy between different frequencies is lifted whenever the dominant foreground component exhibits a significant spatial spectral index variation. Based on this method, we present two different, but each internally consistent, sets of monopole and dipole coefficients for the nine-year WMAP, Planck 2013, SFD 100 μm, Haslam 408 MHz and Reich & Reich 1420 MHz maps. The two sets have been derived with different analysis assumptions and data selection, and provide an estimate of residual systematic uncertainties. In general, our values are in good agreement with previously published results. Among the most notable results are a relative dipole between the WMAP and Planck experiments of 10-15μK (depending on frequency), an estimate of the 408 MHz map monopole of 8.9 ± 1.3 K, and a non-zero dipole in the 1420 MHz map of 0.15 ± 0.03 K pointing towards Galactic coordinates (l,b) = (308°,-36°) ± 14°. These values represent the sum of any instrumental and data processing offsets, as well as any Galactic or extra-Galactic component that is spectrally uniform over the full sky.
Emotional expression in music: contribution, linearity, and additivity of primary musical cues.
Eerola, Tuomas; Friberg, Anders; Bresin, Roberto
2013-01-01
The aim of this study is to manipulate musical cues systematically to determine the aspects of music that contribute to emotional expression, and whether these cues operate in additive or interactive fashion, and whether the cue levels can be characterized as linear or non-linear. An optimized factorial design was used with six primary musical cues (mode, tempo, dynamics, articulation, timbre, and register) across four different music examples. Listeners rated 200 musical examples according to four perceived emotional characters (happy, sad, peaceful, and scary). The results exhibited robust effects for all cues and the ranked importance of these was established by multiple regression. The most important cue was mode followed by tempo, register, dynamics, articulation, and timbre, although the ranking varied across the emotions. The second main result suggested that most cue levels contributed to the emotions in a linear fashion, explaining 77-89% of variance in ratings. Quadratic encoding of cues did lead to minor but significant increases of the models (0-8%). Finally, the interactions between the cues were non-existent suggesting that the cues operate mostly in an additive fashion, corroborating recent findings on emotional expression in music (Juslin and Lindström, 2010).
Emotional expression in music: contribution, linearity, and additivity of primary musical cues
Eerola, Tuomas; Friberg, Anders; Bresin, Roberto
2013-01-01
The aim of this study is to manipulate musical cues systematically to determine the aspects of music that contribute to emotional expression, and whether these cues operate in additive or interactive fashion, and whether the cue levels can be characterized as linear or non-linear. An optimized factorial design was used with six primary musical cues (mode, tempo, dynamics, articulation, timbre, and register) across four different music examples. Listeners rated 200 musical examples according to four perceived emotional characters (happy, sad, peaceful, and scary). The results exhibited robust effects for all cues and the ranked importance of these was established by multiple regression. The most important cue was mode followed by tempo, register, dynamics, articulation, and timbre, although the ranking varied across the emotions. The second main result suggested that most cue levels contributed to the emotions in a linear fashion, explaining 77–89% of variance in ratings. Quadratic encoding of cues did lead to minor but significant increases of the models (0–8%). Finally, the interactions between the cues were non-existent suggesting that the cues operate mostly in an additive fashion, corroborating recent findings on emotional expression in music (Juslin and Lindström, 2010). PMID:23908642
Guisan, A.; Edwards, T.C.; Hastie, T.
2002-01-01
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.
Two-Dimensional, Supersonic, Linearized Flow with Heat Addition
NASA Technical Reports Server (NTRS)
Lomax, Harvard
1959-01-01
Calculations are presented for the forces on a thin supersonic wing underneath which the air is heated. The analysis is limited principally to linearized theory but nonlinear effects are considered. It is shown that significant advantages to external heating would exist if the heat were added well below and ahead of the wing.
Meija, Juris; Pagliano, Enea; Mester, Zoltán
2014-09-02
Uncertainty of the result from the method of standard addition is often underestimated due to neglect of the covariance between the intercept and the slope. In order to simplify the data analysis from standard addition experiments, we propose x-y coordinate swapping in conventional linear regression. Unlike the ratio of the intercept and slope, which is the result of the traditional method of standard addition, the result of the inverse standard addition is obtained directly from the intercept of the swapped calibration line. Consequently, the uncertainty evaluation becomes markedly simpler. The method is also applicable to nonlinear curves, such as the quadratic model, without incurring any additional complexity.
NASA Astrophysics Data System (ADS)
Buermeyer, Jonas; Gundlach, Matthias; Grund, Anna-Lisa; Grimm, Volker; Spizyn, Alexander; Breckow, Joachim
2016-09-01
This work is part of the analysis of the effects of constructional energy-saving measures to radon concentration levels in dwellings performed on behalf of the German Federal Office for Radiation Protection. In parallel to radon measurements for five buildings, both meteorological data outside the buildings and the indoor climate factors were recorded. In order to access effects of inhabited buildings, the amount of carbon dioxide (CO2) was measured. For a statistical linear regression model, the data of one object was chosen as an example. Three dummy variables were extracted from the process of the CO2 concentration to provide information on the usage and ventilation of the room. The analysis revealed a highly autoregressive model for the radon concentration with additional influence by the natural environmental factors. The autoregression implies a strong dependency on a radon source since it reflects a backward dependency in time. At this point of the investigation, it cannot be determined whether the influence by outside factors affects the source of radon or the habitant’s ventilation behavior resulting in variation of the occurring concentration levels. In any case, the regression analysis might provide further information that would help to distinguish these effects. In the next step, the influence factors will be weighted according to their impact on the concentration levels. This might lead to a model that enables the prediction of radon concentration levels based on the measurement of CO2 in combination with environmental parameters, as well as the development of advices for ventilation.
Turlapaty, Anish C.; Younan, Nicolas H.; Anantharaj, Valentine G
2012-01-01
Currently, the only viable option for a global precipitation product is the merger of several precipitation products from different modalities. In this article, we develop a linear merging methodology based on spatiotemporal regression. Four highresolution precipitation products (HRPPs), obtained through methods including the Climate Prediction Center's Morphing (CMORPH), Geostationary Operational Environmental Satellite-Based Auto-Estimator (GOES-AE), GOES-Based Hydro-Estimator (GOES-HE) and Self-Calibrating Multivariate Precipitation Retrieval (SCAMPR) algorithms, are used in this study. The merged data are evaluated against the Arkansas Red Basin River Forecast Center's (ABRFC's) ground-based rainfall product. The evaluation is performed using the Heidke skill score (HSS) for four seasons, from summer 2007 to spring 2008, and for two different rainfall detection thresholds. It is shown that the merged data outperform all the other products in seven out of eight cases. A key innovation of this machine learning method is that only 6% of the validation data are used for the initial training. The sensitivity of the algorithm to location, distribution of training data, selection of input data sets and seasons is also analysed and presented.
Retrieving soil water contents from soil temperature measurements by using linear regression
NASA Astrophysics Data System (ADS)
Xu, Qin; Zhou, Binbin
2003-11-01
A simple linear regression method is developed to retrieve daily averaged soil water content from diurnal variations of soil temperature measured at three or more depths. The method is applied to Oklahoma Mesonet soil temperature data collected at the depths of 5, 10, and 30 cm during 11 20 June 1995. The retrieved bulk soil water contents are compared with direct measurements for one pair of nearly collocated Mesonet and ARM stations and also compared with the retrievals of a previous method at 14 enhanced Oklahoma Mesonet stations. The results show that the current method gives more persistent retrievals than the previous method. The method is also applied to Oklahoma Mesonet soil temperature data collected at the depths of 5, 25, 60, and 75 cm from the Norman site during 20 30 July 1998 and 1 31 July 2000. The retrieved soil water contents are verified by collocated soil water content measurements with rms differences smaller than the soil water observation error (0.05 m3 m-3). The retrievals are found to be moderately sensitive to random errors (±0.1 K) in the soil temperature observations and errors in the soil type specifications.
NASA Astrophysics Data System (ADS)
Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.
2009-08-01
In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.
A Multiple Linear Regression Model For Estimation of Flood Peaks In Baden-wuerttemberg/germany
NASA Astrophysics Data System (ADS)
Casper, M.; Krieger, S.; Ihringer, J.
In water resources planning good estimations of flood peaks are necessary for con- struction planning, for the estimation of the existing risk potential and for the valida- tion of rainfall-runoff models. Generally these indexes are only available through statistical analysis for gauged sites. Furthermore the reliability of the underlying time series can often not be proven be- cause they are too short or of bad quality. Therefore a spatial adjustment of all gauge indexes was conducted before a linear multiple regression model was applied. It now enable us to estimate flood peaks for almost any ungauged site of the study area. The model bases on 8 parameters describing the catchment properties. 7 parameters can be derived directly from digital data including a digital elevation model (catch- ment size, maximum flowlength, center flowlength, weighted slope, annual rainfall, portion of urban resp. forested area). The last parameter is an empirical landscape fac- tor, which allows to consider the regional differences in flood generation. The spatial distribution of this factor has been linked in a first approach to the hydro-geological map of Baden-Wuerttemberg. The overall performance of the model is very good. But for some areas, the determination of the landscape factor is difficult. Further investigations indicated that a more process based approach allows to im- prove the fit of this landscape factor and also the quality of the regionalisation model. By integrating detailed soil information (which is available area wide) some hydro- geological classes could be subdivided in subclasses. By replacing the parameter "weighted slope" by a parameter which better describes the driving forces of flood generation, the model performance could be improved significantly.
2009-01-01
Background The central nervous system is considered a sanctuary site for HIV-1 replication. Variables associated with HIV cerebrospinal fluid (CSF) viral load in the context of opportunistic CNS infections are poorly understood. Our objective was to evaluate the relation between: (1) CSF HIV-1 viral load and CSF cytological and biochemical characteristics (leukocyte count, protein concentration, cryptococcal antigen titer); (2) CSF HIV-1 viral load and HIV-1 plasma viral load; and (3) CSF leukocyte count and the peripheral blood CD4+ T lymphocyte count. Methods Our approach was to use a prospective collection and analysis of pre-treatment, paired CSF and plasma samples from antiretroviral-naive HIV-positive patients with cryptococcal meningitis and assisted at the Francisco J Muñiz Hospital, Buenos Aires, Argentina (period: 2004 to 2006). We measured HIV CSF and plasma levels by polymerase chain reaction using the Cobas Amplicor HIV-1 Monitor Test version 1.5 (Roche). Data were processed with Statistix 7.0 software (linear regression analysis). Results Samples from 34 patients were analyzed. CSF leukocyte count showed statistically significant correlation with CSF HIV-1 viral load (r = 0.4, 95% CI = 0.13-0.63, p = 0.01). No correlation was found with the plasma viral load, CSF protein concentration and cryptococcal antigen titer. A positive correlation was found between peripheral blood CD4+ T lymphocyte count and the CSF leukocyte count (r = 0.44, 95% CI = 0.125-0.674, p = 0.0123). Conclusion Our study suggests that CSF leukocyte count influences CSF HIV-1 viral load in patients with meningitis caused by Cryptococcus neoformans.
Zhang, Yiwei; Pan, Wei
2014-01-01
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis (PCA), which is perhaps most popular due to its early appearance, simplicity and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to a LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed versus random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g. environmental pollution or life style), the PCs may be able to implicitly and effectively adjust for the confounder while the LMM cannot. Accordingly, to adjust for both population structures and non-genetic confounders, we propose a hybrid method combining the use and thus strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios. PMID:25536929
Optimization of end-members used in multiple linear regression geochemical mixing models
NASA Astrophysics Data System (ADS)
Dunlea, Ann G.; Murray, Richard W.
2015-11-01
Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).
Fisher, Charles K.; Mehta, Pankaj
2014-01-01
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called “errors-in-variables”. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human
Stevens, F. J.; Bobrovnik, S. A.; Biosciences Division; Palladin Inst. Biochemistry
2007-12-01
Physiological responses of the adaptive immune system are polyclonal in nature whether induced by a naturally occurring infection, by vaccination to prevent infection or, in the case of animals, by challenge with antigen to generate reagents of research or commercial significance. The composition of the polyclonal responses is distinct to each individual or animal and changes over time. Differences exist in the affinities of the constituents and their relative proportion of the responsive population. In addition, some of the antibodies bind to different sites on the antigen, whereas other pairs of antibodies are sterically restricted from concurrent interaction with the antigen. Even if generation of a monoclonal antibody is the ultimate goal of a project, the quality of the resulting reagent is ultimately related to the characteristics of the initial immune response. It is probably impossible to quantitatively parse the composition of a polyclonal response to antigen. However, molecular regression allows further parameterization of a polyclonal antiserum in the context of certain simplifying assumptions. The antiserum is described as consisting of two competing populations of high- and low-affinity and unknown relative proportions. This simple model allows the quantitative determination of representative affinities and proportions. These parameters may be of use in evaluating responses to vaccines, to evaluating continuity of antibody production whether in vaccine recipients or animals used for the production of antisera, or in optimizing selection of donors for the production of monoclonal antibodies.
A New Test of Linear Hypotheses in OLS Regression under Heteroscedasticity of Unknown Form
ERIC Educational Resources Information Center
Cai, Li; Hayes, Andrew F.
2008-01-01
When the errors in an ordinary least squares (OLS) regression model are heteroscedastic, hypothesis tests involving the regression coefficients can have Type I error rates that are far from the nominal significance level. Asymptotically, this problem can be rectified with the use of a heteroscedasticity-consistent covariance matrix (HCCM)…
Confidence Intervals for an Effect Size Measure in Multiple Linear Regression
ERIC Educational Resources Information Center
Algina, James; Keselman, H. J.; Penfield, Randall D.
2007-01-01
The increase in the squared multiple correlation coefficient ([Delta]R[squared]) associated with a variable in a regression equation is a commonly used measure of importance in regression analysis. The coverage probability that an asymptotic and percentile bootstrap confidence interval includes [Delta][rho][squared] was investigated. As expected,…
ERIC Educational Resources Information Center
Aitkin, Murray A.
Fixed-width confidence intervals for a population regression line over a finite interval of x have recently been derived by Gafarian. The method is extended to provide fixed-width confidence intervals for the difference between two population regression lines, resulting in a simple procedure analogous to the Johnson-Neyman technique. (Author)
NASA Astrophysics Data System (ADS)
Kim, T. W.; Park, G. H.
2014-12-01
Seasonal variation of aragonite saturation state (Ωarag) in the North Pacific Ocean (NPO) was investigated, using multiple linear regression (MLR) models produced from the PACIFICA (Pacific Ocean interior carbon) dataset. Data within depth ranges of 50-1200m were used to derive MLR models, and three parameters (potential temperature, nitrate, and apparent oxygen utilization (AOU)) were chosen as predictor variables because these parameters are associated with vertical mixing, DIC (dissolved inorganic carbon) removal and release which all affect Ωarag in water column directly or indirectly. The PACIFICA dataset was divided into 5° × 5° grids, and a MLR model was produced in each grid, giving total 145 independent MLR models over the NPO. Mean RMSE (root mean square error) and r2 (coefficient of determination) of all derived MLR models were approximately 0.09 and 0.96, respectively. Then the obtained MLR coefficients for each of predictor variables and an intercept were interpolated over the study area, thereby making possible to allocate MLR coefficients to data-sparse ocean regions. Predictability from the interpolated coefficients was evaluated using Hawaiian time-series data, and as a result mean residual between measured and predicted Ωarag values was approximately 0.08, which is less than the mean RMSE of our MLR models. The interpolated MLR coefficients were combined with seasonal climatology of World Ocean Atlas 2013 (1° × 1°) to produce seasonal Ωarag distributions over various depths. Large seasonal variability in Ωarag was manifested in the mid-latitude Western NPO (24-40°N, 130-180°E) and low-latitude Eastern NPO (0-12°N, 115-150°W). In the Western NPO, seasonal fluctuations of water column stratification appeared to be responsible for the seasonal variation in Ωarag (~ 0.5 at 50 m) because it closely followed temperature variations in a layer of 0-75 m. In contrast, remineralization of organic matter was the main cause for the seasonal
Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D
2015-05-01
Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical
The development of a flyover noise prediction technique using multiple linear regression analysis
NASA Astrophysics Data System (ADS)
Rathgeber, R. K.
1981-04-01
At Cessna Aircraft Company, statistical analyses have been developed to define important trends in flyover noise data. Multiple regression techniques have provided the means to develop flyover noise prediction methods which have resulted in better accuracy than methods used in the past. Regression analyses have been conducted to determine the important relationship between propeller helical tip Mach number and the flyover noise level. Other variables have been included in the regression models either because the added variable contributed to reducing the remaining variation in the model or the variable appeared to be a strong causal agent of flyover noise.
NASA Technical Reports Server (NTRS)
Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.
2010-01-01
The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.
NASA Astrophysics Data System (ADS)
Gupta, Kinjal Dhar; Vilalta, Ricardo; Asadourian, Vicken; Macri, Lucas
2014-05-01
We describe an approach to automate the classification of Cepheid variable stars into two subtypes according to their pulsation mode. Automating such classification is relevant to obtain a precise determination of distances to nearby galaxies, which in addition helps reduce the uncertainty in the current expansion of the universe. One main difficulty lies in the compatibility of models trained using different galaxy datasets; a model trained using a training dataset may be ineffectual on a testing set. A solution to such difficulty is to adapt predictive models across domains; this is necessary when the training and testing sets do not follow the same distribution. The gist of our methodology is to train a predictive model on a nearby galaxy (e.g., Large Magellanic Cloud), followed by a model-adaptation step to make the model operable on other nearby galaxies. We follow a parametric approach to density estimation by modeling the training data (anchor galaxy) using a mixture of linear models. We then use maximum likelihood to compute the right amount of variable displacement, until the testing data closely overlaps the training data. At that point, the model can be directly used in the testing data (target galaxy).
NASA Astrophysics Data System (ADS)
Christiansen, Bo
2015-04-01
Linear regression methods are without doubt the most used approaches to describe and predict data in the physical sciences. They are often good first order approximations and they are in general easier to apply and interpret than more advanced methods. However, even the properties of univariate regression can lead to debate over the appropriateness of various models as witnessed by the recent discussion about climate reconstruction methods. Before linear regression is applied important choices have to be made regarding the origins of the noise terms and regarding which of the two variables under consideration that should be treated as the independent variable. These decisions are often not easy to make but they may have a considerable impact on the results. We seek to give a unified probabilistic - Bayesian with flat priors - treatment of univariate linear regression and prediction by taking, as starting point, the general errors-in-variables model (Christiansen, J. Clim., 27, 2014-2031, 2014). Other versions of linear regression can be obtained as limits of this model. We derive the likelihood of the model parameters and predictands of the general errors-in-variables model by marginalizing over the nuisance parameters. The resulting likelihood is relatively simple and easy to analyze and calculate. The well known unidentifiability of the errors-in-variables model is manifested as the absence of a well-defined maximum in the likelihood. However, this does not mean that probabilistic inference can not be made; the marginal likelihoods of model parameters and the predictands have, in general, well-defined maxima. We also include a probabilistic version of classical calibration and show how it is related to the errors-in-variables model. The results are illustrated by an example from the coupling between the lower stratosphere and the troposphere in the Northern Hemisphere winter.
ERIC Educational Resources Information Center
Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti
2010-01-01
In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
ERIC Educational Resources Information Center
Wing, Coady; Cook, Thomas D.
2013-01-01
The sharp regression discontinuity design (RDD) has three key weaknesses compared to the randomized clinical trial (RCT). It has lower statistical power, it is more dependent on statistical modeling assumptions, and its treatment effect estimates are limited to the narrow subpopulation of cases immediately around the cutoff, which is rarely of…
An Additional Measure of Overall Effect Size for Logistic Regression Models
ERIC Educational Resources Information Center
Allen, Jeff; Le, Huy
2008-01-01
Users of logistic regression models often need to describe the overall predictive strength, or effect size, of the model's predictors. Analogs of R[superscript 2] have been developed, but none of these measures are interpretable on the same scale as effects of individual predictors. Furthermore, R[superscript 2] analogs are not invariant to the…
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1975-01-01
Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
NASA Astrophysics Data System (ADS)
Ji, Yanju; Huang, Wanyu; Yu, Mingmei; Guan, Shanshan; Wang, Yuan; Zhu, Yu
2017-01-01
This article studies full-waveform associated identification method of airborne time-domain electromagnetic method (ATEM) 3-d anomalies based on multiple linear regression analysis method. By using convolution algorithm, full-waveform theoretical responses are computed to derive sample library including switch-off-time period responses and off-time period responses. Extract full-waveform attributes from theoretical responses to derive linear regression equations which are used to identify the geological parameters. In order to improve the precision ulteriorly, we optimize the identification method by separating the sample library into different groups and identify the parameter respectively. Performance of full-waveform associated identification method with field data of wire-loop test experiments with ATEM system in Daedao of Changchun proves that the full-waveform associated identification method is feasible practically.
Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression.
Beckstead, Jason W
2012-03-30
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic strategy to isolate, examine, and remove suppression effects has been offered. In this article such an approach, rooted in confirmatory factor analysis theory and employing matrix algebra, is developed. Suppression is viewed as the result of criterion-irrelevant variance operating among predictors. Decomposition of predictor variables into criterion-relevant and criterion-irrelevant components using structural equation modeling permits derivation of regression weights with the effects of criterion-irrelevant variance omitted. Three examples with data from applied research are used to illustrate the approach: the first assesses child and parent characteristics to explain why some parents of children with obsessive-compulsive disorder accommodate their child's compulsions more so than do others, the second examines various dimensions of personal health to explain individual differences in global quality of life among patients following heart surgery, and the third deals with quantifying the relative importance of various aptitudes for explaining academic performance in a sample of nursing students. The approach is offered as an analytic tool for investigators interested in understanding predictor-criterion relationships when complex patterns of intercorrelation among predictors are present and is shown to augment dominance analysis.
Metrology and 1/f noise: linear regressions and confidence intervals in flicker noise context
NASA Astrophysics Data System (ADS)
Vernotte, F.; Lantz, E.
2015-04-01
1/f noise is very common but is difficult to handle in a metrological way. After having recalled the main characteristics of a strongly correlated noise, this paper will determine relationships giving confidence intervals over the arithmetic mean and the linear drift parameters. A complete example of processing of an actual measurement sequence affected by 1/f noise will be given.
Regression Is a Univariate General Linear Model Subsuming Other Parametric Methods as Special Cases.
ERIC Educational Resources Information Center
Vidal, Sherry
Although the concept of the general linear model (GLM) has existed since the 1960s, other univariate analyses such as the t-test and the analysis of variance models have remained popular. The GLM produces an equation that minimizes the mean differences of independent variables as they are related to a dependent variable. From a computer printout…
ERIC Educational Resources Information Center
Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael
2011-01-01
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Additional results on 'Reducing geometric dilution of precision using ridge regression'
NASA Astrophysics Data System (ADS)
Kelly, Robert J.
1990-07-01
Kelly (1990) presented preliminary results on the feasibility of using ridge regression (RR) to reduce the effects of geometric dilution of precision (GDOP) error inflation in position-fix navigation systems. Recent results indicate that RR will not reduce GDOP bias inflation when biaslike measurement errors last much longer than the aircraft guidance-loop response time. This conclusion precludes the use of RR on navigation systems whose dominant error sources are biaslike; e.g., the GPS selective-availability error source. The simulation results given by Kelly are, however, valid for the conditions defined. Although RR has not yielded a satisfactory solution to the general GDOP problem, it has illuminated the role that multicollinearity plays in navigation signal processors such as the Kalman filter. Bias inflation, initial position guess errors, ridge-parameter selection methodology, and the recursive ridge filter are discussed.
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The
Schilling, K.E.; Wolter, C.F.
2005-01-01
Nineteen variables, including precipitation, soils and geology, land use, and basin morphologic characteristics, were evaluated to develop Iowa regression models to predict total streamflow (Q), base flow (Qb), storm flow (Qs) and base flow percentage (%Qb) in gauged and ungauged watersheds in the state. Discharge records from a set of 33 watersheds across the state for the 1980 to 2000 period were separated into Qb and Qs. Multiple linear regression found that 75.5 percent of long term average Q was explained by rainfall, sand content, and row crop percentage variables, whereas 88.5 percent of Qb was explained by these three variables plus permeability and floodplain area variables. Qs was explained by average rainfall and %Qb was a function of row crop percentage, permeability, and basin slope variables. Regional regression models developed for long term average Q and Qb were adapted to annual rainfall and showed good correlation between measured and predicted values. Combining the regression model for Q with an estimate of mean annual nitrate concentration, a map of potential nitrate loads in the state was produced. Results from this study have important implications for understanding geomorphic and land use controls on streamflow and base flow in Iowa watersheds and similar agriculture dominated watersheds in the glaciated Midwest. (JAWRA) (Copyright ?? 2005).
The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
Li, Xingguo; Zhao, Tuo; Yuan, Xiaoming; Liu, Han
2016-01-01
This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, ℓq Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME). These methods exploit different nonsmooth loss functions to gain modeling exibility, estimation robustness, and tuning insensitiveness. The developed solver is based on the alternating direction method of multipliers (ADMM), which is further accelerated by the multistage screening approach. The package flare is coded in double precision C, and called from R by a user-friendly interface. The memory usage is optimized by using the sparse matrix output. The experiments show that flare is efficient and can scale up to large problems.
Abad, Cesar C. C.; Barros, Ronaldo V.; Bertuzzi, Romulo; Gagliardi, João F. L.; Lima-Silva, Adriano E.; Lambert, Mike I.
2016-01-01
Abstract The aim of this study was to verify the power of VO2max, peak treadmill running velocity (PTV), and running economy (RE), unadjusted or allometrically adjusted, in predicting 10 km running performance. Eighteen male endurance runners performed: 1) an incremental test to exhaustion to determine VO2max and PTV; 2) a constant submaximal run at 12 km·h−1 on an outdoor track for RE determination; and 3) a 10 km running race. Unadjusted (VO2max, PTV and RE) and adjusted variables (VO2max0.72, PTV0.72 and RE0.60) were investigated through independent multiple regression models to predict 10 km running race time. There were no significant correlations between 10 km running time and either the adjusted or unadjusted VO2max. Significant correlations (p < 0.01) were found between 10 km running time and adjusted and unadjusted RE and PTV, providing models with effect size > 0.84 and power > 0.88. The allometrically adjusted predictive model was composed of PTV0.72 and RE0.60 and explained 83% of the variance in 10 km running time with a standard error of the estimate (SEE) of 1.5 min. The unadjusted model composed of a single PVT accounted for 72% of the variance in 10 km running time (SEE of 1.9 min). Both regression models provided powerful estimates of 10 km running time; however, the unadjusted PTV may provide an uncomplicated estimation. PMID:28149382
Pérez, Paulino; de los Campos, Gustavo; Crossa, José; Gianola, Daniel
2010-01-01
The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression) implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO) in a unifi ed framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyper-parameters, are also addressed. PMID:21566722
Theobald, Roddy; Freeman, Scott
2014-01-01
Although researchers in undergraduate science, technology, engineering, and mathematics education are currently using several methods to analyze learning gains from pre- and posttest data, the most commonly used approaches have significant shortcomings. Chief among these is the inability to distinguish whether differences in learning gains are due to the effect of an instructional intervention or to differences in student characteristics when students cannot be assigned to control and treatment groups at random. Using pre- and posttest scores from an introductory biology course, we illustrate how the methods currently in wide use can lead to erroneous conclusions, and how multiple linear regression offers an effective framework for distinguishing the impact of an instructional intervention from the impact of student characteristics on test score gains. In general, we recommend that researchers always use student-level regression models that control for possible differences in student ability and preparation to estimate the effect of any nonrandomized instructional intervention on student performance.
Jaber, Abobaker M; Ismail, Mohd Tahir; Altaher, Alsaidi M
2014-01-01
This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD) with nonparametric methods of local linear quantile (LLQ). We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.
NASA Technical Reports Server (NTRS)
Lo, Ching F.
1999-01-01
The integration of Radial Basis Function Networks and Back Propagation Neural Networks with the Multiple Linear Regression has been accomplished to map nonlinear response surfaces over a wide range of independent variables in the process of the Modem Design of Experiments. The integrated method is capable to estimate the precision intervals including confidence and predicted intervals. The power of the innovative method has been demonstrated by applying to a set of wind tunnel test data in construction of response surface and estimation of precision interval.
Coelho, Lúcia H G; Gutz, Ivano G R
2006-03-15
A chemometric method for analysis of conductometric titration data was introduced to extend its applicability to lower concentrations and more complex acid-base systems. Auxiliary pH measurements were made during the titration to assist the calculation of the distribution of protonable species on base of known or guessed equilibrium constants. Conductivity values of each ionized or ionizable species possibly present in the sample were introduced in a general equation where the only unknown parameters were the total concentrations of (conjugated) bases and of strong electrolytes not involved in acid-base equilibria. All these concentrations were adjusted by a multiparametric nonlinear regression (NLR) method, based on the Levenberg-Marquardt algorithm. This first conductometric titration method with NLR analysis (CT-NLR) was successfully applied to simulated conductometric titration data and to synthetic samples with multiple components at concentrations as low as those found in rainwater (approximately 10 micromol L(-1)). It was possible to resolve and quantify mixtures containing a strong acid, formic acid, acetic acid, ammonium ion, bicarbonate and inert electrolyte with accuracy of 5% or better.
NASA Astrophysics Data System (ADS)
Chen, Wei-Yin; Ding, Li-Fu; Chen, Liang-Gee
2007-01-01
Luminance and chrominance correction (LCC) is important in multi-view video coding (MVC) because it provides better rate-distortion performance when encoding video sequences captured by ill-calibrated multi-view cameras. This paper presents a robust and fast LCC algorithm based on motion compensated linear regression which reuses the motion information from the encoder. We adopt the linear weighted prediction model in H.264/AVC as our LCC model. In our experiments, the proposed LCC algorithm outperforms basic histogram matching method up to 0.4dB with only few computational overhead and zero external memory bandwidth. So, the dataflow of this method is suitable for low bandwidth/low power VLSI design for future multi-view applications.
Liu, Yan; Salvendy, Gavriel
2009-05-01
This paper aims to demonstrate the effects of measurement errors on psychometric measurements in ergonomics studies. A variety of sources can cause random measurement errors in ergonomics studies and these errors can distort virtually every statistic computed and lead investigators to erroneous conclusions. The effects of measurement errors on five most widely used statistical analysis tools have been discussed and illustrated: correlation; ANOVA; linear regression; factor analysis; linear discriminant analysis. It has been shown that measurement errors can greatly attenuate correlations between variables, reduce statistical power of ANOVA, distort (overestimate, underestimate or even change the sign of) regression coefficients, underrate the explanation contributions of the most important factors in factor analysis and depreciate the significance of discriminant function and discrimination abilities of individual variables in discrimination analysis. The discussions will be restricted to subjective scales and survey methods and their reliability estimates. Other methods applied in ergonomics research, such as physical and electrophysiological measurements and chemical and biomedical analysis methods, also have issues of measurement errors, but they are beyond the scope of this paper. As there has been increasing interest in the development and testing of theories in ergonomics research, it has become very important for ergonomics researchers to understand the effects of measurement errors on their experiment results, which the authors believe is very critical to research progress in theory development and cumulative knowledge in the ergonomics field.
Gu, Yun; Wang, Yao; Yu, Tian-Yang; Liang, Yong-Min; Xu, Peng-Fei
2014-12-15
The development of a direct vinylogous Michael addition of linear nucleophilic substrates is a long-standing challenge because of the poor reactivity and the considerable difficulty in controlling regioselectivity. By employing a rationally designed multifunctional supramolecular iminium catalysis strategy, the first direct vinylogous Michael addition of unmodified linear substrates to α,β-unsaturated aldehydes, to afford chiral 1,7-dioxo compounds with good yields and excellent regio- as well as enantioselectivity, has been developed.
Outlier detection method in linear regression based on sum of arithmetic progression.
Adikaram, K K L B; Hussein, M A; Effenberger, M; Becker, T
2014-01-01
We introduce a new nonparametric outlier detection method for linear series, which requires no missing or removed data imputation. For an arithmetic progression (a series without outliers) with n elements, the ratio (R) of the sum of the minimum and the maximum elements and the sum of all elements is always 2/n : (0,1]. R ≠ 2/n always implies the existence of outliers. Usually, R < 2/n implies that the minimum is an outlier, and R > 2/n implies that the maximum is an outlier. Based upon this, we derived a new method for identifying significant and nonsignificant outliers, separately. Two different techniques were used to manage missing data and removed outliers: (1) recalculate the terms after (or before) the removed or missing element while maintaining the initial angle in relation to a certain point or (2) transform data into a constant value, which is not affected by missing or removed elements. With a reference element, which was not an outlier, the method detected all outliers from data sets with 6 to 1000 elements containing 50% outliers which deviated by a factor of ±1.0e - 2 to ±1.0e + 2 from the correct value.
A componential model of human interaction with graphs: 1. Linear regression modeling
NASA Technical Reports Server (NTRS)
Gillan, Douglas J.; Lewis, Robert
1994-01-01
Task analyses served as the basis for developing the Mixed Arithmetic-Perceptual (MA-P) model, which proposes (1) that people interacting with common graphs to answer common questions apply a set of component processes-searching for indicators, encoding the value of indicators, performing arithmetic operations on the values, making spatial comparisons among indicators, and repsonding; and (2) that the type of graph and user's task determine the combination and order of the components applied (i.e., the processing steps). Two experiments investigated the prediction that response time will be linearly related to the number of processing steps according to the MA-P model. Subjects used line graphs, scatter plots, and stacked bar graphs to answer comparison questions and questions requiring arithmetic calculations. A one-parameter version of the model (with equal weights for all components) and a two-parameter version (with different weights for arithmetic and nonarithmetic processes) accounted for 76%-85% of individual subjects' variance in response time and 61%-68% of the variance taken across all subjects. The discussion addresses possible modifications in the MA-P model, alternative models, and design implications from the MA-P model.
Nucleus detection using gradient orientation information and linear least squares regression
NASA Astrophysics Data System (ADS)
Kwak, Jin Tae; Hewitt, Stephen M.; Xu, Sheng; Pinto, Peter A.; Wood, Bradford J.
2015-03-01
Computerized histopathology image analysis enables an objective, efficient, and quantitative assessment of digitized histopathology images. Such analysis often requires an accurate and efficient detection and segmentation of histological structures such as glands, cells and nuclei. The segmentation is used to characterize tissue specimens and to determine the disease status or outcomes. The segmentation of nuclei, in particular, is challenging due to the overlapping or clumped nuclei. Here, we propose a nuclei seed detection method for the individual and overlapping nuclei that utilizes the gradient orientation or direction information. The initial nuclei segmentation is provided by a multiview boosting approach. The angle of the gradient orientation is computed and traced for the nuclear boundaries. Taking the first derivative of the angle of the gradient orientation, high concavity points (junctions) are discovered. False junctions are found and removed by adopting a greedy search scheme with the goodness-of-fit statistic in a linear least squares sense. Then, the junctions determine boundary segments. Partial boundary segments belonging to the same nucleus are identified and combined by examining the overlapping area between them. Using the final set of the boundary segments, we generate the list of seeds in tissue images. The method achieved an overall precision of 0.89 and a recall of 0.88 in comparison to the manual segmentation.
NASA Technical Reports Server (NTRS)
Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.
1998-01-01
The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.
Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf
2015-01-01
The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200–400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. PMID:25005037
NASA Astrophysics Data System (ADS)
Elliott, J.; de Souza, R. S.; Krone-Martins, A.; Cameron, E.; Ishida, E. E. O.; Hilbe, J.
2015-04-01
Machine learning techniques offer a precious tool box for use within astronomy to solve problems involving so-called big data. They provide a means to make accurate predictions about a particular system without prior knowledge of the underlying physical processes of the data. In this article, and the companion papers of this series, we present the set of Generalized Linear Models (GLMs) as a fast alternative method for tackling general astronomical problems, including the ones related to the machine learning paradigm. To demonstrate the applicability of GLMs to inherently positive and continuous physical observables, we explore their use in estimating the photometric redshifts of galaxies from their multi-wavelength photometry. Using the gamma family with a log link function we predict redshifts from the PHoto-z Accuracy Testing simulated catalogue and a subset of the Sloan Digital Sky Survey from Data Release 10. We obtain fits that result in catastrophic outlier rates as low as ∼1% for simulated and ∼2% for real data. Moreover, we can easily obtain such levels of precision within a matter of seconds on a normal desktop computer and with training sets that contain merely thousands of galaxies. Our software is made publicly available as a user-friendly package developed in Python, R and via an interactive web application. This software allows users to apply a set of GLMs to their own photometric catalogues and generates publication quality plots with minimum effort. By facilitating their ease of use to the astronomical community, this paper series aims to make GLMs widely known and to encourage their implementation in future large-scale projects, such as the Large Synoptic Survey Telescope.
Polak, Adam G
2011-02-01
Many patients undergo long-term artificial ventilation and their respiratory system mechanics should be monitored to detect changes in the patient's state and to optimize ventilator settings. In this work the most popular algorithms for tracking variations of respiratory resistance (R(rs)) and elastance (E(rs)) over a ventilatory cycle were analysed in terms of systematic and random errors. Additionally, a new approach was proposed and compared to the previous ones. It takes into account an exact description of flow integration by volume-dependent lung compliance. The results of analyses showed advantages of this new approach and enabled to form several suggestions. Algorithms including R(rs) and E(rs) dependencies on airflow and lung volume can be effectively applied only at low levels of noise present in measurement data, otherwise the use of the simplest model with constant parameters is preferable. Additionally, one should avoid including the resistance dependence on airflow alone, since this considerably destroys the retrieved trace of R(rs). Finally, the estimated cyclic trajectories of R(rs) and E(rs) are more sensitive to noise present in pressure than in the flow signal, and the elastance traces are estimated more accurately than the resistance ones.
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
A study of thermal response of concrete towers employing linear regression
NASA Astrophysics Data System (ADS)
Norouzi, Mehdi; Zarbaf, Seyed Ehsan Haji Agha Mohammad; Dalvi, Aditi; Hunt, Victor; Helmicki, Arthur
2016-04-01
It has been shown that the variations of structural properties due to changing environmental conditions such as temperature can be as significant as those caused by structural damage and even liveload. Therefore, tracking changes that are correlated with environmental variations is a necessary step in order to detect and assess structural damage in addition to the normal structural response to traffic. In this paper, daily measurement data that is collected from the concrete towers of the Ironton-Russell Bridge will be presented and correlation of the collected measurement data and temperature will be overviewed. Variation of the daily thermal response of tower concrete walls will be compared with the daily thermal responses of the steel box within the tower and finally, thermal coefficient for compensating the thermal induced responses will be estimated.
Brown, A M
2001-06-01
The objective of this present study was to introduce a simple, easily understood method for carrying out non-linear regression analysis based on user input functions. While it is relatively straightforward to fit data with simple functions such as linear or logarithmic functions, fitting data with more complicated non-linear functions is more difficult. Commercial specialist programmes are available that will carry out this analysis, but these programmes are expensive and are not intuitive to learn. An alternative method described here is to use the SOLVER function of the ubiquitous spreadsheet programme Microsoft Excel, which employs an iterative least squares fitting routine to produce the optimal goodness of fit between data and function. The intent of this paper is to lead the reader through an easily understood step-by-step guide to implementing this method, which can be applied to any function in the form y=f(x), and is well suited to fast, reliable analysis of data in all fields of biology.
Jahandideh, Sepideh Jahandideh, Samad; Asadabadi, Ebrahim Barzegari; Askarian, Mehrdad; Movahedi, Mohammad Mehdi; Hosseini, Somayyeh; Jahandideh, Mina
2009-11-15
Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R{sup 2} were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R{sup 2} confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.
Jahandideh, Sepideh; Jahandideh, Samad; Asadabadi, Ebrahim Barzegari; Askarian, Mehrdad; Movahedi, Mohammad Mehdi; Hosseini, Somayyeh; Jahandideh, Mina
2009-11-01
Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R(2) were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R(2) confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.
Zisi, Chrysostomi; Sampsonidis, Ioannis; Fasoula, Stella; Papachristos, Konstantinos; Witting, Michael; Gika, Helen G.; Nikitas, Panagiotis; Pappa-Louisi, Adriani
2017-01-01
Modified quantitative structure retention relationships (QSRRs) are proposed and applied to describe two retention data sets: A set of 94 metabolites studied by a hydrophilic interaction chromatography system under organic content gradient conditions and a set of tryptophan and its major metabolites analyzed by a reversed-phase chromatographic system under isocratic as well as pH and/or simultaneous pH and organic content gradient conditions. According to the proposed modification, an additional descriptor is added to a conventional QSRR expression, which is the analyte retention time, tR(R), measured under the same elution conditions, but in a second chromatographic column considered as a reference one. The 94 metabolites were studied on an Amide column using a Bare Silica column as a reference. For the second dataset, a Kinetex EVO C18 and a Gemini-NX column were used, where each of them was served as a reference column of the other. We found in all cases a significant improvement of the performance of the QSRR models when the descriptor tR(R) was considered. PMID:28208794
2015-01-01
The weighted histogram analysis method (WHAM) is a standard protocol for postprocessing the information from biased umbrella sampling simulations to construct the potential of mean force with respect to a set of order parameters. By virtue of the WHAM equations, the unbiased density of state is determined by satisfying a self-consistent condition through an iterative procedure. While the method works very effectively when the number of order parameters is small, its computational cost grows rapidly in higher dimension. Here, we present a simple and efficient alternative strategy, which avoids solving the self-consistent WHAM equations iteratively. An efficient multivariate linear regression framework is utilized to link the biased probability densities of individual umbrella windows and yield an unbiased global free energy landscape in the space of order parameters. It is demonstrated with practical examples that free energy landscapes that are comparable in accuracy to WHAM can be generated at a small fraction of the cost. PMID:26574437
NASA Astrophysics Data System (ADS)
dos Santos, T. S.; Mendes, D.; Torres, R. R.
2015-08-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANN) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon, Northeastern Brazil and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model out- put and observed monthly precipitation. We used GCMs experiments for the 20th century (RCP Historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANN significantly outperforms the MLR downscaling of monthly precipitation variability.
NASA Astrophysics Data System (ADS)
Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.
2016-01-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.
Meng, Yilin; Roux, Benoît
2015-08-11
The weighted histogram analysis method (WHAM) is a standard protocol for postprocessing the information from biased umbrella sampling simulations to construct the potential of mean force with respect to a set of order parameters. By virtue of the WHAM equations, the unbiased density of state is determined by satisfying a self-consistent condition through an iterative procedure. While the method works very effectively when the number of order parameters is small, its computational cost grows rapidly in higher dimension. Here, we present a simple and efficient alternative strategy, which avoids solving the self-consistent WHAM equations iteratively. An efficient multivariate linear regression framework is utilized to link the biased probability densities of individual umbrella windows and yield an unbiased global free energy landscape in the space of order parameters. It is demonstrated with practical examples that free energy landscapes that are comparable in accuracy to WHAM can be generated at a small fraction of the cost.
Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena
2013-01-01
The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models.
Olaya-Abril, Alfonso; Parras-Alcántara, Luis; Lozano-García, Beatriz; Obregón-Romero, Rafael
2017-03-15
Over time, the interest on soil studies has increased due to its role in carbon sequestration in terrestrial ecosystems, which could contribute to decreasing atmospheric CO2 rates. In many studies, independent variables were related to soil organic carbon (SOC) alone, however, the contribution degree of each variable with the experimentally determined SOC content were not considered. In this study, samples from 612 soil profiles were obtained in a natural protected (Red Natura 2000) of Sierra Morena (Mediterranean area, South Spain), considering only the topsoil 0-25cm, for better comparison between results. 24 independent variables were used to define it relationship with SOC content. Subsequently, using a multiple linear regression analysis, the effects of these variables on the SOC correlation was considered. Finally, the best parameters determined with the regression analysis were used in a climatic change scenario. The model indicated that SOC in a future scenario of climate change depends on average temperature of coldest quarter (41.9%), average temperature of warmest quarter (34.5%), annual precipitation (22.2%) and annual average temperature (1.3%). When the current and future situations were compared, the SOC content in the study area was reduced a 35.4%, and a trend towards migration to higher latitude and altitude was observed.
NASA Astrophysics Data System (ADS)
Ebrahimi, Hadi; Rajaee, Taher
2017-01-01
Simulation of groundwater level (GWL) fluctuations is an important task in management of groundwater resources. In this study, the effect of wavelet analysis on the training of the artificial neural network (ANN), multi linear regression (MLR) and support vector regression (SVR) approaches was investigated, and the ANN, MLR and SVR along with the wavelet-ANN (WNN), wavelet-MLR (WLR) and wavelet-SVR (WSVR) models were compared in simulating one-month-ahead of GWL. The only variable used to develop the models was the monthly GWL data recorded over a period of 11 years from two wells in the Qom plain, Iran. The results showed that decomposing GWL time series into several sub-time series, extremely improved the training of the models. For both wells 1 and 2, the Meyer and Db5 wavelets produced better results compared to the other wavelets; which indicated wavelet types had similar behavior in similar case studies. The optimal number of delays was 6 months, which seems to be due to natural phenomena. The best WNN model, using Meyer mother wavelet with two decomposition levels, simulated one-month-ahead with RMSE values being equal to 0.069 m and 0.154 m for wells 1 and 2, respectively. The RMSE values for the WLR model were 0.058 m and 0.111 m, and for WSVR model were 0.136 m and 0.060 m for wells 1 and 2, respectively.
Naguib, Ibrahim A.; Abdelaleem, Eglal A.; Zaazaa, Hala E.; Hussein, Essraa A.
2015-01-01
A comparison between partial least squares regression and support vector regression chemometric models is introduced in this study. The two models are implemented to analyze cefoperazone sodium in presence of its reported impurities, 7-aminocephalosporanic acid and 5-mercapto-1-methyl-tetrazole, in pure powders and in pharmaceutical formulations through processing UV spectroscopic data. For best results, a 3-factor 4-level experimental design was used, resulting in a training set of 16 mixtures containing different ratios of interfering moieties. For method validation, an independent test set consisting of 9 mixtures was used to test predictive ability of established models. The introduced results show the capability of the two proposed models to analyze cefoperazone in presence of its impurities 7-aminocephalosporanic acid and 5-mercapto-1-methyl-tetrazole with high trueness and selectivity (101.87 ± 0.708 and 101.43 ± 0.536 for PLSR and linear SVR, resp.). Analysis results of drug products were statistically compared to a reported HPLC method showing no significant difference in trueness and precision, indicating the capability of the suggested multivariate calibration models to be reliable and adequate for routine quality control analysis of drug product. SVR offers more accurate results with lower prediction error compared to PLSR model; however, PLSR is easy to handle and fast to optimize. PMID:26664764
Silva, Ana Elisa Pereira; Freitas, Corina da Costa; Dutra, Luciano Vieira; Molento, Marcelo Beltrão
2016-02-15
Fasciola hepatica is the causative agent of fasciolosis, a disease that triggers a chronic inflammatory process in the liver affecting mainly ruminants and other animals including humans. In Brazil, F. hepatica occurs in larger numbers in the most Southern state of Rio Grande do Sul. The objective of this study was to estimate areas at risk using an eight-year (2002-2010) time series of climatic and environmental variables that best relate to the disease using a linear regression method to municipalities in the state of Rio Grande do Sul. The positivity index of the disease, which is the rate of infected animal per slaughtered animal, was divided into three risk classes: low, medium and high. The accuracy of the known sample classification on the confusion matrix for the low, medium and high rates produced by the estimated model presented values between 39 and 88% depending of the year. The regression analysis showed the importance of the time-based data for the construction of the model, considering the two variables of the previous year of the event (positivity index and maximum temperature). The generated data is important for epidemiological and parasite control studies mainly because F. hepatica is an infection that can last from months to years.
Jäntschi, Lorentz
2016-01-01
Multiple linear regression analysis is widely used to link an outcome with predictors for better understanding of the behaviour of the outcome of interest. Usually, under the assumption that the errors follow a normal distribution, the coefficients of the model are estimated by minimizing the sum of squared deviations. A new approach based on maximum likelihood estimation is proposed for finding the coefficients on linear models with two predictors without any constrictive assumptions on the distribution of the errors. The algorithm was developed, implemented, and tested as proof-of-concept using fourteen sets of compounds by investigating the link between activity/property (as outcome) and structural feature information incorporated by molecular descriptors (as predictors). The results on real data demonstrated that in all investigated cases the power of the error is significantly different by the convenient value of two when the Gauss-Laplace distribution was used to relax the constrictive assumption of the normal distribution of the error. Therefore, the Gauss-Laplace distribution of the error could not be rejected while the hypothesis that the power of the error from Gauss-Laplace distribution is normal distributed also failed to be rejected. PMID:28090215
Rafiei, Hamid; Khanzadeh, Marziyeh; Mozaffari, Shahla; Bostanifar, Mohammad Hassan; Avval, Zhila Mohajeri; Aalizadeh, Reza; Pourbasheer, Eslam
2016-01-01
Quantitative structure-activity relationship (QSAR) study has been employed for predicting the inhibitory activities of the Hepatitis C virus (HCV) NS5B polymerase inhibitors. A data set consisted of 72 compounds was selected, and then different types of molecular descriptors were calculated. The whole data set was split into a training set (80 % of the dataset) and a test set (20 % of the dataset) using principle component analysis. The stepwise (SW) and the genetic algorithm (GA) techniques were used as variable selection tools. Multiple linear regression method was then used to linearly correlate the selected descriptors with inhibitory activities. Several validation technique including leave-one-out and leave-group-out cross-validation, Y-randomization method were used to evaluate the internal capability of the derived models. The external prediction ability of the derived models was further analyzed using modified r2, concordance correlation coefficient values and Golbraikh and Tropsha acceptable model criteria's. Based on the derived results (GA-MLR), some new insights toward molecular structural requirements for obtaining better inhibitory activity were obtained. PMID:27065774
Jäntschi, Lorentz; Bálint, Donatella; Bolboacă, Sorana D
2016-01-01
Multiple linear regression analysis is widely used to link an outcome with predictors for better understanding of the behaviour of the outcome of interest. Usually, under the assumption that the errors follow a normal distribution, the coefficients of the model are estimated by minimizing the sum of squared deviations. A new approach based on maximum likelihood estimation is proposed for finding the coefficients on linear models with two predictors without any constrictive assumptions on the distribution of the errors. The algorithm was developed, implemented, and tested as proof-of-concept using fourteen sets of compounds by investigating the link between activity/property (as outcome) and structural feature information incorporated by molecular descriptors (as predictors). The results on real data demonstrated that in all investigated cases the power of the error is significantly different by the convenient value of two when the Gauss-Laplace distribution was used to relax the constrictive assumption of the normal distribution of the error. Therefore, the Gauss-Laplace distribution of the error could not be rejected while the hypothesis that the power of the error from Gauss-Laplace distribution is normal distributed also failed to be rejected.
Bohmanova, J; Miglior, F; Jamrozik, J; Misztal, I; Sullivan, P G
2008-09-01
A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-age-season of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.
Kokaly, R.F.; Clark, R.N.
1999-01-01
We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using
NASA Astrophysics Data System (ADS)
Lee, C. Y.; Tippett, M. K.; Sobel, A. H.; Camargo, S. J.
2014-12-01
We are working towards the development of a new statistical-dynamical downscaling system to study the influence of climate on tropical cyclones (TCs). The first step is development of an appropriate model for TC intensity as a function of environmental variables. We approach this issue with a stochastic model consisting of a multiple linear regression model (MLR) for 12-hour intensity forecasts as a deterministic component, and a random error generator as a stochastic component. Similar to the operational Statistical Hurricane Intensity Prediction Scheme (SHIPS), MLR relates the surrounding environment to storm intensity, but with only essential predictors calculated from monthly-mean NCEP reanalysis fields (potential intensity, shear, etc.) and from persistence. The deterministic MLR is developed with data from 1981-1999 and tested with data from 2000-2012 for the Atlantic, Eastern North Pacific, Western North Pacific, Indian Ocean, and Southern Hemisphere basins. While the global MLR's skill is comparable to that of the operational statistical models (e.g., SHIPS), the distribution of the predicted maximum intensity from deterministic results has a systematic low bias compared to observations; the deterministic MLR creates almost no storms with intensities greater than 100 kt. The deterministic MLR can be significantly improved by adding the stochastic component, based on the distribution of random forecasting errors from the deterministic model compared to the training data. This stochastic component may be thought of as representing the component of TC intensification that is not linearly related to the environmental variables. We find that in order for the stochastic model to accurately capture the observed distribution of maximum storm intensities, the stochastic component must be auto-correlated across 12-hour time steps. This presentation also includes a detailed discussion of the distributions of other TC-intensity related quantities, as well as the inter
Martin, L; Mezcua, M; Ferrer, C; Gil Garcia, M D; Malato, O; Fernandez-Alba, A R
2013-01-01
The main objective of this work was to establish a mathematical function that correlates pesticide residue levels in apple juice with the levels of the pesticides applied on the raw fruit, taking into account some of their physicochemical properties such as water solubility, the octanol/water partition coefficient, the organic carbon partition coefficient, vapour pressure and density. A mixture of 12 pesticides was applied to an apple tree; apples were collected after 10 days of application. After harvest, apples were treated with a mixture of three post-harvest pesticides and the fruits were then processed in order to obtain apple juice following a routine industrial process. The pesticide residue levels in the apple samples were analysed using two multi-residue methods based on LC-MS/MS and GC-MS/MS. The concentration of pesticides was determined in samples derived from the different steps of processing. The processing factors (the coefficient between residue level in the processed commodity and the residue level in the commodity to be processed) obtained for the full juicing process were found to vary among the different pesticides studied. In order to investigate the relationships between the levels of pesticide residue found in apple juice samples and their physicochemical properties, principal component analysis (PCA) was performed using two sets of samples (one of them using experimental data obtained in this work and the other including the data taken from the literature). In both cases the correlation was found between processing factors of pesticides in the apple juice and the negative logarithms (base 10) of the water solubility, octanol/water partition coefficient and organic carbon partition coefficient. The linear correlation between these physicochemical properties and the processing factor were established using a multiple linear regression technique.
NASA Astrophysics Data System (ADS)
Deml, Ann M.; O'Hayre, Ryan; Wolverton, Chris; Stevanović, Vladan
2016-02-01
The availability of quantitatively accurate total energies (Etot) of atoms, molecules, and solids, enabled by the development of density functional theory (DFT), has transformed solid state physics, quantum chemistry, and materials science by allowing direct calculations of measureable quantities, such as enthalpies of formation (Δ Hf ). Still, the ability to compute Etot and Δ Hf values does not, necessarily, provide insights into the physical mechanisms behind their magnitudes or chemical trends. Here, we examine a large set of calculated Etot and Δ Hf values obtained from the DFT+U -based fitted elemental-phase reference energies (FERE) approach [V. Stevanović, S. Lany, X. Zhang, and A. Zunger, Phys. Rev. B 85, 115104 (2012), 10.1103/PhysRevB.85.115104] to probe relationships between the Etot/Δ Hf of metal-nonmetal compounds in their ground-state crystal structures and properties describing the compound compositions and their elemental constituents. From a stepwise linear regression, we develop a linear model for Etot, and consequently Δ Hf , that reproduces calculated FERE values with a mean absolute error of ˜80 meV/atom. The most significant contributions to the model include calculated total energies of the constituent elements in their reference phases (e.g., metallic iron or gas phase O2), atomic ionization energies and electron affinities, Pauling electronegativity differences, and atomic electric polarizabilities. These contributions are discussed in the context of their connection to the underlying physics. We also demonstrate that our Etot/Δ Hf model can be directly extended to predict the Etot and Δ Hf of compounds outside the set used to develop the model.
Estimate of influenza cases using generalized linear, additive and mixed models.
Oviedo, Manuel; Domínguez, Ángela; Pilar Muñoz, M
2015-01-01
We investigated the relationship between reported cases of influenza in Catalonia (Spain). Covariates analyzed were: population, age, data of report of influenza, and health region during 2010-2014 using data obtained from the SISAP program (Institut Catala de la Salut - Generalitat of Catalonia). Reported cases were related with the study of covariates using a descriptive analysis. Generalized Linear Models, Generalized Additive Models and Generalized Additive Mixed Models were used to estimate the evolution of the transmission of influenza. Additive models can estimate non-linear effects of the covariates by smooth functions; and mixed models can estimate data dependence and variability in factor variables using correlations structures and random effects, respectively. The incidence rate of influenza was calculated as the incidence per 100 000 people. The mean rate was 13.75 (range 0-27.5) in the winter months (December, January, February) and 3.38 (range 0-12.57) in the remaining months. Statistical analysis showed that Generalized Additive Mixed Models were better adapted to the temporal evolution of influenza (serial correlation 0.59) than classical linear models.
Svendsen, Carina; Skov, Thomas; van den Berg, Frans W J
2016-07-22
Fluorescence spectroscopy is a sensitive and selective technique, which can be of great value in bioprocesses to provide online, real-time measures of chemical compounds. Although fluorescence spectroscopy is a widely studied method, not much attention has been given to issues concerning intensity variations in the fluorescence landscapes due to pH fluctuations. This study elucidates how pH fluctuations cause intensity changes in fluorescence measurements and thereby decreases the quality of the subsequent quantification. A photo-degradation process of riboflavin was investigated by fluorescence spectroscopy and used as a model system. A two-step modeling approach, combining weighted PARAllel FACtor analysis (PARAFAC) with weighted non-linear regression of the known reaction kinetics, is suggested as a way of handling the fluorescence intensity shifts caused by the pH changes. The suggested strategy makes it possible to compensate for uncertainties in the shifted data and thereby obtain more reliable concentration profiles for the chemical compounds and kinetic parameters of the reaction.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666
Alexeeff, Stacey E; Carroll, Raymond J; Coull, Brent
2016-04-01
Spatial modeling of air pollution exposures is widespread in air pollution epidemiology research as a way to improve exposure assessment. However, there are key sources of exposure model uncertainty when air pollution is modeled, including estimation error and model misspecification. We examine the use of predicted air pollution levels in linear health effect models under a measurement error framework. For the prediction of air pollution exposures, we consider a universal Kriging framework, which may include land-use regression terms in the mean function and a spatial covariance structure for the residuals. We derive the bias induced by estimation error and by model misspecification in the exposure model, and we find that a misspecified exposure model can induce asymptotic bias in the effect estimate of air pollution on health. We propose a new spatial simulation extrapolation (SIMEX) procedure, and we demonstrate that the procedure has good performance in correcting this asymptotic bias. We illustrate spatial SIMEX in a study of air pollution and birthweight in Massachusetts.
Kamruzzaman, Md; Mamun, A S M A; Bakar, Sheikh Muhammad Abu; Saw, Aik; Kamarul, T; Islam, Md Nurul; Hossain, Md Golam
2016-11-21
The aim of this study was to investigate the socioeconomic and demographic factors influencing the body mass index (BMI) of non-pregnant married Bangladeshi women of reproductive age. Secondary (Hierarchy) data from the 2011 Bangladesh Demographic and Health Survey, collected using two-stage stratified cluster sampling, were used. Two-level linear regression analysis was performed to remove the cluster effect of the variables. The mean BMI of married non-pregnant Bangladeshi women was 21.60±3.86 kg/m2, and the prevalence of underweight, overweight and obesity was 22.8%, 14.9% and 3.2%, respectively. After removing the cluster effect, age and age at first marriage were found to be positively (p<0.01) related with BMI. Number of children was negatively related with women's BMI. Lower BMI was especially found among women from rural areas and poor families, with an uneducated husband, with no television at home and who were currently breast-feeding. Age, total children ever born, age at first marriage, type of residence, education level, level of husband's education, wealth index, having a television at home and practising breast-feeding were found to be important predictors for the BMI of married Bangladeshi non-pregnant women of reproductive age. This information could be used to identify sections of the Bangladeshi population that require special attention, and to develop more effective strategies to resolve the problem of malnutrition.
NASA Astrophysics Data System (ADS)
de Souza, R. S.; Hilbe, J. M.; Buelens, B.; Riggs, J. D.; Cameron, E.; Ishida, E. E. O.; Chies-Santos, A. L.; Killedar, M.
2015-10-01
In this paper, the third in a series illustrating the power of generalized linear models (GLMs) for the astronomical community, we elucidate the potential of the class of GLMs which handles count data. The size of a galaxy's globular cluster (GC) population (NGC) is a prolonged puzzle in the astronomical literature. It falls in the category of count data analysis, yet it is usually modelled as if it were a continuous response variable. We have developed a Bayesian negative binomial regression model to study the connection between NGC and the following galaxy properties: central black hole mass, dynamical bulge mass, bulge velocity dispersion and absolute visual magnitude. The methodology introduced herein naturally accounts for heteroscedasticity, intrinsic scatter, errors in measurements in both axes (either discrete or continuous) and allows modelling the population of GCs on their natural scale as a non-negative integer variable. Prediction intervals of 99 per cent around the trend for expected NGC comfortably envelope the data, notably including the Milky Way, which has hitherto been considered a problematic outlier. Finally, we demonstrate how random intercept models can incorporate information of each particular galaxy morphological type. Bayesian variable selection methodology allows for automatically identifying galaxy types with different productions of GCs, suggesting that on average S0 galaxies have a GC population 35 per cent smaller than other types with similar brightness.
NASA Astrophysics Data System (ADS)
Lee, Taesam; Ouarda, Taha B. M. J.; Yoon, Sunkwon
2017-02-01
Climate change frequently causes highly nonlinear and irregular behaviors in hydroclimatic systems. The stochastic simulation of hydroclimatic variables reproduces such irregular behaviors and is beneficial for assessing their impact on other regimes. The objective of the current study is to propose a novel method, a k-nearest neighbor (KNN) based on the local linear regression method (KLR), to reproduce nonlinear and heteroscedastic relations in hydroclimatic variables. The proposed model was validated with a nonlinear, heteroscedastic, lag-1 time dependent test function. The validation results of the test function show that the key statistics, nonlinear dependence, and heteroscedascity of the test data are reproduced well by the KLR model. In contrast, a traditional resampling technique, KNN resampling (KNNR), shows some biases with respect to key statistics, such as the variance and lag-1 correlation. Furthermore, the proposed KLR model was used to simulate the annual minimum of the consecutive 7-day average daily mean flow (Min7D) of the Romaine River, Quebec. The observed and extended North Atlantic Oscillation (NAO) index is incorporated into the model. The case study results of the observed period illustrate that the KLR model sufficiently reproduced key statistics and the nonlinear heteroscedasticity relation. For the future period, a lower mean is observed, which indicates that drier conditions other than normal might be expected in the next decade in the Romaine River. Overall, it is concluded that the KLR model can be a good alternative for simulating irregular and nonlinear behaviors in hydroclimatic variables.
Linard, Joshua I.
2013-01-01
Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.
Sánchez, J P; Misztal, I; Aguilar, I; Bertrand, J K
2008-02-01
The objective of this study was to examine the feasibility of using random regression-spline (RR-spline) models for fitting growth traits in a multibreed beef cattle population. To meet the objective, the results from the RR-spline model were compared with the widely used multitrait (MT) model when both were fit to a data set (1.8 million records and 1.1 million animals) provided by the American Gelbvieh Association. The effect of prior information on the EBV of sires was also investigated. In both RR-spline and MT models, the following effects were considered: individual direct and maternal additive genetic effects, contemporary group, age of the animal at measurement, direct and maternal heterosis, and direct and maternal additive genetic mean effect of the breed. Additionally, the RR-spline model included an individual direct permanent environmental effect. When both MT and RR-spline models were applied to a data set containing records for weaning weight (WWT) and yearling weight (YWT) within specified age ranges, the rankings of bulls' direct EBV (as measured via Pearson correlations) provided by both models were comparable, with slightly greater differences in the reranking of bulls observed for YWT evaluations (>or=0.99 for BWT and WWT and >or=0.98 for YWT); also, some bulls dropped from the top 100 list when these lists were compared across methods. For maternal effects, the estimated correlations were slightly smaller, particularly for YWT; again, some drops from the top 100 animals were observed. As in regular MT multibreed genetic evaluations, the heterosis effects and the additive genetic effects of the breed could not be estimated from field data, because there were not enough contemporary groups with the proper composition of purebred and crossbred animals; thus, prior information based on literature values had to be included. The inclusion of prior information had a negligible effect in the overall ranking for bulls with greater than 20 birth weight
Tapanuli Organoclay Addition Into Linear Low Density Polyethylene-Pineapple Fiber Composites
Adawiyah, Robiatul; Juwono, Ariadne L.; Roseno, Seto
2010-12-23
Linear low density polyethylene-Tapanuli organoclay-pineapple fiber composites were succesfully synthesized by a melt intercalation method. The clay was modified as an organoclay by a cation exchange reaction using hexadecyl trimethyl ammonium bromide (HDTMABr) surfactant. The X-ray diffraction results of the organoclay exhibited a higher basal spacing of 1.87 nm compared to the unmodified clay of 1.46 nm. The composite tensile strength was enhanced up to 46.4% with the 1 wt% organoclay addition. Both tensile and flexural moduli increased up to 150.6% and 43% with the 3 wt% organoclay addition to the composites. However, the flexural strength of the composites was not improved with the organoclay addition. The addition of organoclay has also decreased the heat deflection temperature of the composites.
Tapanuli Organoclay Addition Into Linear Low Density Polyethylene-Pineapple Fiber Composites
NASA Astrophysics Data System (ADS)
Adawiyah, Robiatul; Juwono, Ariadne L.; Roseno, Seto
2010-12-01
Linear low density polyethylene-Tapanuli organoclay-pineapple fiber composites were succesfully synthesized by a melt intercalation method. The clay was modified as an organoclay by a cation exchange reaction using hexadecyl trimethyl ammonium bromide (HDTMABr) surfactant. The X-ray diffraction results of the organoclay exhibited a higher basal spacing of 1.87 nm compared to the unmodified clay of 1.46 nm. The composite tensile strength was enhanced up to 46.4% with the 1 wt% organoclay addition. Both tensile and flexural moduli increased up to 150.6% and 43% with the 3 wt% organoclay addition to the composites. However, the flexural strength of the composites was not improved with the organoclay addition. The addition of organoclay has also decreased the heat deflection temperature of the composites.
Shahab, Yosif A; Khalil, Rabah A
2006-10-01
A new approach to NMR chemical shift additivity parameters using simultaneous linear equation method has been introduced. Three general nitrogen-15 NMR chemical shift additivity parameters with physical significance for aliphatic amines in methanol and cyclohexane and their hydrochlorides in methanol have been derived. A characteristic feature of these additivity parameters is the individual equation can be applied to both open-chain and rigid systems. The factors that influence the (15)N chemical shift of these substances have been determined. A new method for evaluating conformational equilibria at nitrogen in these compounds using the derived additivity parameters has been developed. Conformational analyses of these substances have been worked out. In general, the results indicate that there are four factors affecting the (15)N chemical shift of aliphatic amines; paramagnetic term (p-character), lone pair-proton interactions, proton-proton interactions, symmetry of alkyl substituents and molecular association.
Sharon Falcone Miller; Bruce G. Miller
2007-12-15
This paper compares the emissions factors for a suite of liquid biofuels (three animal fats, waste restaurant grease, pressed soybean oil, and a biodiesel produced from soybean oil) and four fossil fuels (i.e., natural gas, No. 2 fuel oil, No. 6 fuel oil, and pulverized coal) in Penn State's commercial water-tube boiler to assess their viability as fuels for green heat applications. The data were broken into two subsets, i.e., fossil fuels and biofuels. The regression model for the liquid biofuels (as a subset) did not perform well for all of the gases. In addition, the coefficient in the models showed the EPA method underestimating CO and NOx emissions. No relation could be studied for SO{sub 2} for the liquid biofuels as they contain no sulfur; however, the model showed a good relationship between the two methods for SO{sub 2} in the fossil fuels. AP-42 emissions factors for the fossil fuels were also compared to the mass balance emissions factors and EPA CFR Title 40 emissions factors. Overall, the AP-42 emissions factors for the fossil fuels did not compare well with the mass balance emissions factors or the EPA CFR Title 40 emissions factors. Regression analysis of the AP-42, EPA, and mass balance emissions factors for the fossil fuels showed a significant relationship only for CO{sub 2} and SO{sub 2}. However, the regression models underestimate the SO{sub 2} emissions by 33%. These tests illustrate the importance in performing material balances around boilers to obtain the most accurate emissions levels, especially when dealing with biofuels. The EPA emissions factors were very good at predicting the mass balance emissions factors for the fossil fuels and to a lesser degree the biofuels. While the AP-42 emissions factors and EPA CFR Title 40 emissions factors are easier to perform, especially in large, full-scale systems, this study illustrated the shortcomings of estimation techniques. 23 refs., 3 figs., 8 tabs.
NASA Astrophysics Data System (ADS)
Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.
2016-06-01
Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).
Enhancing the linear flow of fine granules through the addition of elongated particles
Guo, Zhiguo; Chen, Xueli; Xu, Yang; Liu, Haifeng
2015-01-01
Sandglasses have been used to record time for thousands of years because of their constant flow rates; however, they now are drawing attention for their substantial scientific importance and extensive industrial applications. The presence of elongated particles in a binary granular system is believed to result in undesired flow because their shape implies a larger resistance to flow. However, our experiments demonstrate that the addition of elongated particles can substantially reduce the flow fluctuation of fine granules and produce a stable linear flow similar to that in an hourglass. On the basis of experimental data and previous reports of flow dynamics, we observed that the linear flow is driven by the “needle particle effect,” including flow orientation, reduced agglomeration, and local perturbation. This phenomenon is observed in several binary granular systems, including fine granules and secondary elongated particles, which demonstrates that our simple method can be widely applied to the accurate measurement of granular flows in industry. PMID:26551736
Investigations of non-linear polymers as high performance lubricant additives
Robinson, Joshua W.; Bhattacharya, Priyanka; Qu, Jun; Bays, J. Timothy; Cosimbescu, Lelia
2015-03-22
Off-the-shelf available engine oils contain an assortment of additives that increase the performance of base oils and maximize the overall efficiency of the machine. With ever increasing requirements for fuel efficiency, the demand for novel materials that outperform older generations is also on the rise. One approach towards increasing overall efficiency is to reduce internal friction and wear in an engine. From an additive approach, this is typically achieved by altering the bulk oil’s viscosity at high temperatures via polymers. In general, the hydrodynamic volume of polymers increase (expand) at elevated temperatures and decrease (contract/deflate) with declining temperatures and this effect is enhanced be carefully designing specific structures and architectures. The natural thinning tendency of base oil with increasing temperatures is in part mitigated by the expansion of the macromolecules added, and the overall effect is decreasing the viscosity losses at high temperatures. Traditional polymer architectures vary from linear to dendritic, where linear polymers of the same chemical composition and molecular weight to its dendritic counterpart will undergo a more significant free volume change in solution with regards to temperature changes. This advantage has been exploited in the literature towards the production of viscosity modifiers. However, one major disadvantage of linear polymers is degradation due to mechanical shear forces and high temperatures causing a shorter additive lifetime. Dendrimers on the other hand are known to demonstrate superior robustness to shear degradation when compared to their respective linear counterparts. An additional advantage of the dendritic architecture is the ability to tailor the peripheral end-groups towards influencing polymer-solvent and/or polymer-surface interactions. Comb-burst hyperbranched polymers are a hybrid of the aforementioned architectures and provide several compromises between the traditional
Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi
2013-09-01
Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore
Babapour, R; Naghdi, R; Ghajar, I; Ghodsi, R
2015-07-01
Rock proportion of subsoil directly influences the cost of embankment in forest road construction. Therefore, developing a reliable framework for rock ratio estimation prior to the road planning could lead to more light excavation and less cost operations. Prediction of rock proportion was subjected to statistical analyses using the application of Artificial Neural Network (ANN) in MATLAB and five link functions of ordinal logistic regression (OLR) according to the rock type and terrain slope properties. In addition to bed rock and slope maps, more than 100 sample data of rock proportion were collected, observed by geologists, from any available bed rock of every slope class. Four predictive models were developed for rock proportion, employing independent variables and applying both the selected probit link function of OLR and Layer Recurrent and Feed forward back propagation networks of Neural Networks. In ANN, different numbers of neurons are considered for the hidden layer(s). Goodness of the fit measures distinguished that ANN models produced better results than OLR with R (2) = 0.72 and Root Mean Square Error = 0.42. Furthermore, in order to show the applicability of the proposed approach, and to illustrate the variability of rock proportion resulted from the model application, the optimum models were applied to a mountainous forest in where forest road network had been constructed in the past.
Improving Brush Polymer Infrared One-Dimensional Photonic Crystals via Linear Polymer Additives
Macfarlane, Robert J.; Kim, Bongkeun; Lee, Byeongdu; Weitekamp, Raymond A.; Bates, Christopher M.; Lee, Siu Fung; Chang, Alice B.; Delaney, Kris T.; Fredrickson, Glen H.; Atwater, Harry A.; Grubbs, Robert H.
2014-12-17
Brush block copolymers (BBCPs) enable the rapid fabrication of self-assembled one-dimensional photonic crystals with photonic band gaps that are tunable in the UV-vis-IR, where the peak wavelength of reflection scales with the molecular weight of the BBCPs. Due to the difficulty in synthesizing very large BBCPs, the fidelity of the assembled lamellar nanostructures drastically erodes as the domains become large enough to reflect IR light, severely limiting their performance as optical filters. To overcome this challenge, short linear homopolymers are used to swell the arrays to ~180% of the initial domain spacing, allowing for photonic band gaps up to~1410 nm without significant opacity in the visible, demonstrating improved ordering of the arrays. Additionally, blending BBCPs with random copolymers enables functional groups to be incorporated into the BBCP array without attaching them directly to the BBCPs. The addition of short linear polymers to the BBCP arrays thus offers a facile means of improving the self-assembly and optical properties of these materials, as well as adding a route to achieving films with greater functionality and tailorability, without the need to develop or optimize the processing conditions for each new brush polymer synthesized.
Bernardo, R
1996-11-01
Best linear unbiased prediction (BLUP) has been found to be useful in maize (Zea mays L.) breeding. The advantage of including both testcross additive and dominance effects (Intralocus Model) in BLUP, rather than only testcross additive effects (Additive Model), has not been clearly demonstrated. The objective of this study was to compare the usefulness of Intralocus and Additive Models for BLUP of maize single-cross performance. Multilocation data from 1990 to 1995 were obtained from the hybrid testing program of Limagrain Genetics. Grain yield, moisture, stalk lodging, and root lodging of untested single crosses were predicted from (1) the performance of tested single crosses and (2) known genetic relationships among the parental inbreds. Correlations between predicted and observed performance were obtained with a delete-one cross-validation procedure. For the Intralocus Model, the correlations ranged from 0.50 to 0.66 for yield, 0.88 to 0.94 for moisture, 0.47 to 0.69 for stalk lodging, and 0.31 to 0.45 for root lodging. The BLUP procedure was consistently more effective with the Intralocus Model than with the Additive Model. When the Additive Model was used instead of the Intralocus Model, the reductions in the correlation were largest for root lodging (0.06-0.35), smallest for moisture (0.00-0.02), and intermediate for yield (0.02-0.06) and stalk lodging (0.02-0.08). The ratio of dominance variance (v D) to total genetic variance (v G) was highest for root lodging (0.47) and lowest for moisture (0.10). The Additive Model may be used if prior information indicates that VD for a given trait has little contribution to VG. Otherwise, the continued use of the Intralocus Model for BLUP of single-cross performance is recommended.
Levine, Matthew E; Albers, David J; Hripcsak, George
2016-01-01
Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models’ explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data. PMID:28269874
Effect of Sb addition on linear and non-linear optical properties of amorphous Ge-Se-Sn thin films
NASA Astrophysics Data System (ADS)
Sharma, Navjeet; Sharma, Surbhi; Sarin, Amit; Kumar, Rajesh
2016-01-01
Optical characterization of amorphous thin films of Ge20Sn10Se70-xSbx (x = 0, 3, 6, 9, 12, 15) has been carried out. Thin films were deposited onto pre cleaned glass substrates using thermal evaporation technique. Transmission spectra of the films were recorded, for normal incidence, in range 400-2400 nm. Refractive index of the films was calculated using the envelope method by Swanepoel. Dispersion analysis has been carried out using single effective oscillator model. Other optical constants such as absorption coefficients, extinction coefficients have also been evaluated. Tauc plots were used to evaluate the optical band gap. The refractive index has been found to be increasing while the band gap decreases with increasing Sb concentration. The observed optical behavior of the films has been explained using chemical bond approach. Cohesive energy is found to be decreasing in the present work, which reflects that bond strength decreases with the increasing content of Sb. Non-linear optical parameters (i.e. n2 and χ(3)) have been derived from linear optical parameters (i.e. n, k, Eg). Observed changes in linear and non-linear parameters have been reported in this study.
Esbaugh, A J; Brix, K V; Mager, E M; Grosell, M
2011-09-01
The current study examined the acute toxicity of lead (Pb) to Ceriodaphnia dubia and Pimephales promelas in a variety of natural waters. The natural waters were selected to range in pertinent water chemistry parameters such as calcium, pH, total CO(2) and dissolved organic carbon (DOC). Acute toxicity was determined for C. dubia and P. promelas using standard 48h and 96h protocols, respectively. For both organisms acute toxicity varied markedly according to water chemistry, with C. dubia LC50s ranging from 29 to 180μg/L and P. promelas LC50s ranging from 41 to 3598μg/L. Additionally, no Pb toxicity was observed for P. promelas in three alkaline natural waters. With respect to water chemistry parameters, DOC had the strongest protective impact for both organisms. A multi-linear regression (MLR) approach combining previous lab data and the current data was used to identify the relative importance of individual water chemistry components in predicting acute Pb toxicity for both species. As anticipated, the P. promelas best-fit MLR model combined DOC, calcium and pH. Unexpectedly, in the C. dubiaMLR model the importance of pH, TCO(2) and calcium was minimal while DOC and ionic strength were the controlling water quality variables. Adjusted R(2) values of 0.82 and 0.64 for the P. promelas and C. dubia models, respectively, are comparable to previously developed biotic ligand models for other metals.
Linear QoS goals of additive and concave metrics in ad hoc cognitive packet routing.
Lent, Ricardo
2006-12-01
This paper addresses two scalability problems related to the cognitive map of packets in ad hoc cognitive packet networks and proposes a solution. Previous works have included latency as part of the routing goal of smart packets, which requires packets to collect their arrival time at each node in a path. Such a requirement resulted in a packet overhead proportional to the path length. The second problem is that the multiplicative form of path availability, which was employed to measure resources, loses accuracy in long paths. To solve these problems, new goals are proposed in this paper. These goals are linear functions of low-overhead metrics and can provide similar performance results with lower cost. One direct result shown in simulation is that smart packets driven by a linear function of path length and buffer occupancy can effectively balance the traffic of multiple flows without the large overhead that would be needed if round-trip delay was used. In addition, energy-aware routing is also studied under this scheme as well as link selection based on their expected level of security.
Cadmium-hazard mapping using a general linear regression model (Irr-Cad) for rapid risk assessment.
Simmons, Robert W; Noble, Andrew D; Pongsakul, P; Sukreeyapongse, O; Chinabut, N
2009-02-01
Research undertaken over the last 40 years has identified the irrefutable relationship between the long-term consumption of cadmium (Cd)-contaminated rice and human Cd disease. In order to protect public health and livelihood security, the ability to accurately and rapidly determine spatial Cd contamination is of high priority. During 2001-2004, a General Linear Regression Model Irr-Cad was developed to predict the spatial distribution of soil Cd in a Cd/Zn co-contaminated cascading irrigated rice-based system in Mae Sot District, Tak Province, Thailand (Longitude E 98 degrees 59'-E 98 degrees 63' and Latitude N 16 degrees 67'-16 degrees 66'). The results indicate that Irr-Cad accounted for 98% of the variance in mean Field Order total soil Cd. Preliminary validation indicated that Irr-Cad 'predicted' mean Field Order total soil Cd, was significantly (p < 0.001) correlated (R (2) = 0.92) with 'observed' mean Field Order total soil Cd values. Field Order is determined by a given field's proximity to primary outlets from in-field irrigation channels and subsequent inter-field irrigation flows. This in turn determines Field Order in Irrigation Sequence (Field Order(IS)). Mean Field Order total soil Cd represents the mean total soil Cd (aqua regia-digested) for a given Field Order(IS). In 2004-2005, Irr-Cad was utilized to evaluate the spatial distribution of total soil Cd in a 'high-risk' area of Mae Sot District. Secondary validation on six randomly selected field groups verified that Irr-Cad predicted mean Field Order total soil Cd and was significantly (p < 0.001) correlated with the observed mean Field Order total soil Cd with R (2) values ranging from 0.89 to 0.97. The practical applicability of Irr-Cad is in its minimal input requirements, namely the classification of fields in terms of Field Order(IS), strategic sampling of all primary fields and laboratory based determination of total soil Cd (T-Cd(P)) and the use of a weighed coefficient for Cd (Coeff
NASA Astrophysics Data System (ADS)
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
Castle, Brian T; Odde, David J
2013-12-03
The structure and free energy of multistranded linear polymer ends evolves as individual subunits are added and lost. Thus, the energetic state of the polymer end is not constant, as assembly theory has assumed. Here we utilize a Brownian dynamics approach to simulate the addition and loss of individual subunits at the polymer tip. Using the microtubule as a primary example, we examined how the structure of the polymer tip dictates the rate at which units are added to and lost from individual protofilaments. We find that freely diffusing subunits arrive less frequently to lagging protofilaments but bind more efficiently, such that there is no kinetic difference between leading and lagging protofilaments within a tapered tip. However, local structure at the nanoscale has up to an order-of-magnitude effect on the rate of addition. Thus, the kinetic on-rate constant, integrated across the microtubule tip (kon,MT), is an ensemble average of the varying individual protofilament on-rate constants (kon,PF). Our findings have implications for both catastrophe and rescue of the dynamic microtubule end, and provide a subnanoscale framework for understanding the mechanism of action of microtubule-associated proteins and microtubule-directed drugs. Although we utilize the specific example of the microtubule here, the findings are applicable to multistranded polymers generally.
NASA Astrophysics Data System (ADS)
Vogel, Pierre
Double exocyclic 1,3-dienes such as 2,3,5,6-tetramethylidene-7-oxabicyclo[2.2.1]heptane and its 1-substituted derivatives undergo two successive Diels-Alder additions with large reactivity difference between the addition of the first equivalent (k 1) and the second equivalent (k 2) of dienophile. This allows one to prepare, through parallel synthesis, a large number of linearly condensed polycyclic systems containing three annulated six-membered rings, including naphthacenyl systems and anthracyclinones. The large k 1/k 2 rate constant ratio is a consequence of the Dimroth principle, the first cycloaddition being significantly more exothermic then the second one. Control of regio- and stereoselectivity of the two successive cycloadditions is possible by 1-substitution of the 2,3,5,6-tetramethylidene-7-oxabicyclo[2.2.1]heptane, for instance by a 1-(dimethoxymethyl) group, or by stereoselective disubstitution of the double diene by arenesulfenyl substituents. Enantiomerically pure anthracyclinones and analogues are obtained using enantiomerically pure dienophiles such as 3-oxo-but-2-en-2-yl esters. The chemistry so-developed has allowed the preparation of enantiomerically pure 6-((aminoalkoxy)oxy)methyl-6,7-dideoxyidarubicinones that are DNA intercalators and inhibitors of topoisomerase II-induced DNA strained religation.
Seow, Wei Jie; Pesatori, Angela Cecilia; Dimont, Emmanuel; Farmer, Peter B.; Albetti, Benedetta; Ettinger, Adrienne S.; Bollati, Valentina; Bolognesi, Claudia; Roggieri, Paola; Panev, Teodor I.; Georgieva, Tzveta; Merlo, Domenico Franco; Bertazzi, Pier Alberto; Baccarelli, Andrea A.
2012-01-01
Chronic occupational exposure to benzene is associated with an increased risk of hematological malignancies such as acute myeloid leukemia (AML), but the underlying mechanisms are still unclear. The main objective of this study was to investigate the association between benzene exposure and DNA methylation, both in repeated elements and candidate genes, in a population of 158 Bulgarian petrochemical workers and 50 unexposed office workers. Exposure assessment included personal monitoring of airborne benzene at work and urinary biomarkers of benzene metabolism (S-phenylmercapturic acid [SPMA] and trans,trans-muconic acid [t,t-MA]) at the end of the work-shift. The median levels of airborne benzene, SPMA and t,t-MA in workers were 0.46 ppm, 15.5 µg/L and 711 µg/L respectively, and exposure levels were significantly lower in the controls. Repeated-element DNA methylation was measured in Alu and LINE-1, and gene-specific methylation in MAGE and p15. DNA methylation levels were not significantly different between exposed workers and controls (P>0.05). Both ordinary least squares (OLS) and beta-regression models were used to estimate benzene-methylation associations. Beta-regression showed better model specification, as reflected in improved coefficient of determination (pseudo R2) and Akaike’s information criterion (AIC). In beta-regression, we found statistically significant reductions in LINE-1 (−0.15%, P<0.01) and p15 (−0.096%, P<0.01) mean methylation levels with each interquartile range (IQR) increase in SPMA. This study showed statistically significant but weak associations of LINE-1 and p15 hypomethylation with SPMA in Bulgarian petrochemical workers. We showed that beta-regression is more appropriate than OLS regression for fitting methylation data. PMID:23227177
Gerber, Samuel; Rubel, Oliver; Bremer, Peer -Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-19
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.
Gerber, Samuel; Rübel, Oliver; Bremer, Peer-Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-01
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduce a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse-Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this paper introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to over-fitting. The Morse-Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse-Smale regression. Supplementary materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse-Smale complex approximation and additional tables for the climate-simulation study. PMID:23687424
Brasquet, C.; Bourges, B.; Le Cloirec, P.
1999-12-01
The adsorption of 55 organic compounds is carried out onto a recently discovered adsorbent, activated carbon cloth. Isotherms are modeled using the Freundlich classical model, and the large database generated allows qualitative assumptions about the adsorption mechanism. However, to confirm these assumptions, a quantitative structure-property relationship methodology is used to assess the correlations between an adsorbability parameter (expressed using the Freundlich parameter K) and topological indices related to the compounds molecular structure (molecular connectivity indices, MCI). This correlation is set up by mean of two different statistical tools, multiple linear regression (MLR) and neural network (NN). A principal component analysis is carried out to generate new and uncorrelated variables. It enables the relations between the MCI to be analyzed, but the multiple linear regression assessed using the principal components (PCs) has a poor statistical quality and introduces high order PCs, too inaccurate for an explanation of the adsorption mechanism. The correlations are thus set up using the original variables (MCI), and both statistical tools, multiple linear regression and neutral network, are compared from a descriptive and predictive point of view. To compare the predictive ability of both methods, a test database of 10 organic compounds is used.
Fragkaki, A G; Farmaki, E; Thomaidis, N; Tsantili-Kakoulidou, A; Angelis, Y S; Koupparis, M; Georgakopoulos, C
2012-09-21
The comparison among different modelling techniques, such as multiple linear regression, partial least squares and artificial neural networks, has been performed in order to construct and evaluate models for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids. The performance of the quantitative structure-retention relationship study, using the multiple linear regression and partial least squares techniques, has been previously conducted. In the present study, artificial neural networks models were constructed and used for the prediction of relative retention times of anabolic androgenic steroids, while their efficiency is compared with that of the models derived from the multiple linear regression and partial least squares techniques. For overall ranking of the models, a novel procedure [Trends Anal. Chem. 29 (2010) 101-109] based on sum of ranking differences was applied, which permits the best model to be selected. The suggested models are considered useful for the estimation of relative retention times of designer steroids for which no analytical data are available.
NASA Astrophysics Data System (ADS)
Marrouni, Karim El; Abou-Rachid, Hakima; Kaliaguine, Serge
This work investigates the chemical reactivity of four linear symmetrical monoethers with molecular oxygen. Such oxygenated compounds may be considered as potential diesel fuel additives in order to reduce the ignition delay in diesel fuel engines. For this purpose, a kinetic study is proposed to clarify the relation between the molecular structure of the fuel molecule and its ignition properties. To this end, DFT calculations were performed for these reactions using B3LYP/6-311G(d,p) and BH&HLYP/6-311G(d,p) to determine structures, energies, and vibrational frequencies of stationary points as well as activated complexes involved in each gas-phase combustion initiation reaction of the monoethers CH3OCH3, C2H5OC2H5, C3H7OC3H7, or C4H9OC4H9 with molecular oxygen. This theoretical kinetic study was carried out using electronic structure results and the transition state theory, to assess the rate constants for all studied combustion reactions. As it has been shown in our previous work [Abou-Rachid et al., J Mol Struct (Theochem) 2003, 621, 293], the cetane number (CN) of a pure organic molecule depends on the initiation rate of its homogeneous gas-phase reaction with molecular oxygen. Indeed, the calculated initiation rate constants of the H-abstraction process of linear monoethers with O2 show a very good correlation with experimental CN data of these pure compounds at T D 1,000 K. This temperature is representative of the operating conditions of a diesel fuel engine.0
Zou, Kelly H.; O’Malley, A. James
2005-01-01
Receiver operating characteristic (ROC) analysis is a useful evaluative method of diagnostic accuracy. A Bayesian hierarchical nonlinear regression model for ROC analysis was developed. A validation analysis of diagnostic accuracy was conducted using prospective multi-center clinical trial prostate cancer biopsy data collected from three participating centers. The gold standard was based on radical prostatectomy to determine local and advanced disease. To evaluate the diagnostic performance of PSA level at fixed levels of Gleason score, a normality transformation was applied to the outcome data. A hierarchical regression analysis incorporating the effects of cluster (clinical center) and cancer risk (low, intermediate, and high) was performed, and the area under the ROC curve (AUC) was estimated. PMID:16161801
NASA Astrophysics Data System (ADS)
Rudy, Ashley C. A.; Lamoureux, Scott F.; Treitz, Paul; van Ewijk, Karin Y.
2016-07-01
To effectively assess and mitigate risk of permafrost disturbance, disturbance-prone areas can be predicted through the application of susceptibility models. In this study we developed regional susceptibility models for permafrost disturbances using a field disturbance inventory to test the transferability of the model to a broader region in the Canadian High Arctic. Resulting maps of susceptibility were then used to explore the effect of terrain variables on the occurrence of disturbances within this region. To account for a large range of landscape characteristics, the model was calibrated using two locations: Sabine Peninsula, Melville Island, NU, and Fosheim Peninsula, Ellesmere Island, NU. Spatial patterns of disturbance were predicted with a generalized linear model (GLM) and generalized additive model (GAM), each calibrated using disturbed and randomized undisturbed locations from both locations and GIS-derived terrain predictor variables including slope, potential incoming solar radiation, wetness index, topographic position index, elevation, and distance to water. Each model was validated for the Sabine and Fosheim Peninsulas using independent data sets while the transferability of the model to an independent site was assessed at Cape Bounty, Melville Island, NU. The regional GLM and GAM validated well for both calibration sites (Sabine and Fosheim) with the area under the receiver operating curves (AUROC) > 0.79. Both models were applied directly to Cape Bounty without calibration and validated equally with AUROC's of 0.76; however, each model predicted disturbed and undisturbed samples differently. Additionally, the sensitivity of the transferred model was assessed using data sets with different sample sizes. Results indicated that models based on larger sample sizes transferred more consistently and captured the variability within the terrain attributes in the respective study areas. Terrain attributes associated with the initiation of disturbances were
Dashtbozorgi, Zahra; Golmohammadi, Hassan
2010-12-01
The main aim of this study was the development of a quantitative structure-property relationship method using an artificial neural network (ANN) for predicting the water-to-wet butyl acetate partition coefficients of organic solutes. As a first step, a genetic algorithm-multiple linear regression model was developed; the descriptors appearing in this model were considered as inputs for the ANN. These descriptors are principal moment of inertia C (I(C)), area-weighted surface charge of hydrogen-bonding donor atoms (HACA-2), Kier and Hall index (order 2) ((2)χ), Balaban index (J), minimum bond order of a C atom (P(C)) and relative negative-charged SA (RNCS). Then a 6-4-1 neural network was generated for the prediction of water-to-wet butyl acetate partition coefficients of 76 organic solutes. By comparing the results obtained from multiple linear regression and ANN models, it can be seen that statistical parameters (Fisher ratio, correlation coefficient and standard error) of the ANN model are better than that regression model, which indicates that nonlinear model can simulate the relationship between the structural descriptors and the partition coefficients of the investigated molecules more accurately.
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
NASA Astrophysics Data System (ADS)
Schreiber, P.; Forbrich, I.; Kutzbach, L.; Hormann, A.; Wolf, U.; Miglovec, M.; Pihlatie, M.; Christiansen, J. R.; Wilmking, M.
2009-04-01
Closed chambers are the most common method to determine methane (CH4) fluxes in peatlands. The concentration change over time is monitored, and the flux is usually calculated by the slope of a linear regression function. However, chambers tend to slow down the gas diffusion by changing the concentration gradient between soil and atmosphere. Theoretically, this would result in a near-exponential concentration change in the chamber headspace. Here, we present data from a laboratory experiment and from two field campaigns on the basis of which we evaluate flux calculation approaches based either on linear or exponential regression models. To compare the fit performances of the two models, we used the Akaike Information Criterion with small sample second order bias correction (AICc). For checking the quality of flux data, we used the standard deviation of residuals. The calibration system in the laboratory experiment used during the chamber calibration campaign at Hyytiälä Forestry Field Station in August 2008 has been described by Pumpanen et al. (2004). Five different flux levels on two different soil porosities where tested. Preliminary results show that most concentration-over-time datasets were best described by the exponential model as evaluated by the AICc. It appeared that the flux calculation using the exponential model was better suited to determine the preset fluxes than that using the linear model. In the dataset of the first field campaign (April to October 2007) from Salmisuo (Finland, 62.46Ë N, 30.58Ë E), however, the majority of fluxes was best fitted with a linear regression on all microsite types. Those fluxes which are best fitted exponentially are most probable due to chamber artefacts. They occurred mostly during a drought period in August 2007, which seemed to increase the artificial impact of the chamber. However, these results might be site-specific: In Ust-Pojeg (Russia, 61.56Ë N, 50.13Ë E), where CH4 emissions are supposed to be
Azadi, Sama; Karimi-Jashni, Ayoub
2016-02-01
Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate.
Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar
2016-01-01
Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively.
NASA Technical Reports Server (NTRS)
Smalheer, C. V.
1973-01-01
The chemistry of lubricant additives is discussed to show what the additives are chemically and what functions they perform in the lubrication of various kinds of equipment. Current theories regarding the mode of action of lubricant additives are presented. The additive groups discussed include the following: (1) detergents and dispersants, (2) corrosion inhibitors, (3) antioxidants, (4) viscosity index improvers, (5) pour point depressants, and (6) antifouling agents.
NASA Astrophysics Data System (ADS)
Sumihara, K.
Based upon legitimate variational principles, one microscopic-macroscopic finite element formulation for linear dynamics is presented by Hybrid Stress Finite Element Method. The microscopic application of Geometric Perturbation introduced by Pian and the introduction of infinitesimal limit core element (Baby Element) have been consistently combined according to the flexible and inherent interpretation of the legitimate variational principles initially originated by Pian and Tong. The conceptual development based upon Hybrid Finite Element Method is extended to linear dynamics with the introduction of physically meaningful higher modes.
Lambert, Ronald J W; Mytilinaios, Ioannis; Maitland, Luke; Brown, Angus M
2012-08-01
This study describes a method to obtain parameter confidence intervals from the fitting of non-linear functions to experimental data, using the SOLVER and Analysis ToolPaK Add-In of the Microsoft Excel spreadsheet. Previously we have shown that Excel can fit complex multiple functions to biological data, obtaining values equivalent to those returned by more specialized statistical or mathematical software. However, a disadvantage of using the Excel method was the inability to return confidence intervals for the computed parameters or the correlations between them. Using a simple Monte-Carlo procedure within the Excel spreadsheet (without recourse to programming), SOLVER can provide parameter estimates (up to 200 at a time) for multiple 'virtual' data sets, from which the required confidence intervals and correlation coefficients can be obtained. The general utility of the method is exemplified by applying it to the analysis of the growth of Listeria monocytogenes, the growth inhibition of Pseudomonas aeruginosa by chlorhexidine and the further analysis of the electrophysiological data from the compound action potential of the rodent optic nerve.
NASA Astrophysics Data System (ADS)
Ibarra-Berastegi, G.; Saénz, J.; Ezcurra, A.; Elías, A.; Diaz Argandoña, J.; Errasti, I.
2011-06-01
In this paper, reanalysis fields from the ECMWF have been statistically downscaled to predict from large-scale atmospheric fields, surface moisture flux and daily precipitation at two observatories (Zaragoza and Tortosa, Ebro Valley, Spain) during the 1961-2001 period. Three types of downscaling models have been built: (i) analogues, (ii) analogues followed by random forests and (iii) analogues followed by multiple linear regression. The inputs consist of data (predictor fields) taken from the ERA-40 reanalysis. The predicted fields are precipitation and surface moisture flux as measured at the two observatories. With the aim to reduce the dimensionality of the problem, the ERA-40 fields have been decomposed using empirical orthogonal functions. Available daily data has been divided into two parts: a training period used to find a group of about 300 analogues to build the downscaling model (1961-1996) and a test period (1997-2001), where models' performance has been assessed using independent data. In the case of surface moisture flux, the models based on analogues followed by random forests do not clearly outperform those built on analogues plus multiple linear regression, while simple averages calculated from the nearest analogues found in the training period, yielded only slightly worse results. In the case of precipitation, the three types of model performed equally. These results suggest that most of the models' downscaling capabilities can be attributed to the analogues-calculation stage.
NASA Astrophysics Data System (ADS)
Ibarra-Berastegi, G.; Saénz, J.; Ezcurra, A.; Elías, A.; Diaz de Argandoña, J.; Errasti, I.
2011-02-01
In this paper, reanalysis fields from the ECMWF have been statistically downscaled to predict from large-scale atmospheric fields surface moisture flux and daily precipitation at two observatories (Zaragoza and Tortosa, Ebro Valley, Spain) during the 1961-2001 period. Three types of downscaling models have been built (i) analogues, (ii) analogues followed by random forests and (iii) analogues followed by multiple linear regression. The inputs consist of data (predictor fields) taken from the ERA-40 reanalysis. The predicted fields are precipitation and surface moisture flux as measured at the two observatories. With the aim to reduce the dimensionality of the problem, the ERA-40 fields have been decomposed using empirical orthogonal functions. Available daily data has been divided into two parts: a training period used to find a group of about 300 analogues to build the downscaling model (1961-1996) and a test period (1997-2001), where models' performance has been assessed using independent data. In the case of surface moisture flux, the models based on analogues followed by random forests do not clearly outperform those built on analogues plus multiple linear regression, while simple averages calculated from the nearest analogues found in the training period, yielded only slightly worse results. In the case of precipitation, the three types of model performed equally. These results suggest that most of the models' downscaling capabilities can be attributted to the analogues-calculation stage.
Chen, Qingxia; Ibrahim, Joseph G
2014-07-01
Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.
Yano, Kentaro; Mita, Suzune; Morimoto, Kaori; Haraguchi, Tamami; Arakawa, Hiroshi; Yoshida, Miyako; Yamashita, Fumiyoshi; Uchida, Takahiro; Ogihara, Takuo
2015-09-01
P-glycoprotein (P-gp) regulates absorption of many drugs in the gastrointestinal tract and their accumulation in tumor tissues, but the basis of substrate recognition by P-gp remains unclear. Bitter-tasting phenylthiocarbamide, which stimulates taste receptor 2 member 38 (T2R38), increases P-gp activity and is a substrate of P-gp. This led us to hypothesize that bitterness intensity might be a predictor of P-gp-inhibitor/substrate status. Here, we measured the bitterness intensity of a panel of P-gp substrates and nonsubstrates with various taste sensors, and used multiple linear regression analysis to examine the relationship between P-gp-inhibitor/substrate status and various physical properties, including intensity of bitter taste measured with the taste sensor. We calculated the first principal component analysis score (PC1) as the representative value of bitterness, as all taste sensor's outputs shared significant correlation. The P-gp substrates showed remarkably greater mean bitterness intensity than non-P-gp substrates. We found that Km value of P-gp substrates were correlated with molecular weight, log P, and PC1 value, and the coefficient of determination (R(2) ) of the linear regression equation was 0.63. This relationship might be useful as an aid to predict P-gp substrate status at an early stage of drug discovery.
Marami Milani, Mohammad Reza; Hense, Andreas; Rahmani, Elham; Ploeger, Angelika
2016-07-23
This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new), and respiratory rate predictor RRP) with three main components of cow's milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R² (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001) with R² (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.
Xu, Hua; Yang, Lanhao; Freitas, Michael A
2008-01-01
Background Rejection of false positive peptide matches in database searches of shotgun proteomic experimental data is highly desirable. Several methods have been developed to use the peptide retention time as to refine and improve peptide identifications from database search algorithms. This report describes the implementation of an automated approach to reduce false positives and validate peptide matches. Results A robust linear regression based algorithm was developed to automate the evaluation of peptide identifications obtained from shotgun proteomic experiments. The algorithm scores peptides based on their predicted and observed reversed-phase liquid chromatography retention times. The robust algorithm does not require internal or external peptide standards to train or calibrate the linear regression model used for peptide retention time prediction. The algorithm is generic and can be incorporated into any database search program to perform automated evaluation of the candidate peptide matches based on their retention times. It provides a statistical score for each peptide match based on its retention time. Conclusion Analysis of peptide matches where the retention time score was included resulted in a significant reduction of false positive matches with little effect on the number of true positives. Overall higher sensitivities and specificities were achieved for database searches carried out with MassMatrix, Mascot and X!Tandem after implementation of the retention time based score algorithm. PMID:18713471
Parinet, Julien; Julien, Maxime; Nun, Pierrick; Robins, Richard J; Remaud, Gerald; Höhener, Patrick
2015-09-01
We aim at predicting the effect of structure and isotopic substitutions on the equilibrium vapour pressure isotope effect of various organic compounds (alcohols, acids, alkanes, alkenes and aromatics) at intermediate temperatures. We attempt to explore quantitative structure property relationships by using artificial neural networks (ANN); the multi-layer perceptron (MLP) and compare the performances of it with multi-linear regression (MLR). These approaches are based on the relationship between the molecular structure (organic chain, polar functions, type of functions, type of isotope involved) of the organic compounds, and their equilibrium vapour pressure. A data set of 130 equilibrium vapour pressure isotope effects was used: 112 were used in the training set and the remaining 18 were used for the test/validation dataset. Two sets of descriptors were tested, a set with all the descriptors: number of(12)C, (13)C, (16)O, (18)O, (1)H, (2)H, OH functions, OD functions, CO functions, Connolly Solvent Accessible Surface Area (CSA) and temperature and a reduced set of descriptors. The dependent variable (the output) is the natural logarithm of the ratios of vapour pressures (ln R), expressed as light/heavy as in classical literature. Since the database is rather small, the leave-one-out procedure was used to validate both models. Considering higher determination coefficients and lower error values, it is concluded that the multi-layer perceptron provided better results compared to multi-linear regression. The stepwise regression procedure is a useful tool to reduce the number of descriptors. To our knowledge, a Quantitative Structure Property Relationship (QSPR) approach for isotopic studies is novel.
Klein, Molly E; Dabbs, David J; Shuai, Yongli; Brufsky, Adam M; Jankowitz, Rachel; Puhalla, Shannon L; Bhargava, Rohit
2013-05-01
Oncotype DX is a commercial assay frequently used for making chemotherapy decisions in estrogen receptor (ER)-positive breast cancers. The result is reported as a recurrence score ranging from 0 to 100, divided into low-risk (<18), intermediate-risk (18-30), and high-risk (≥31) categories. Our pilot study showed that recurrence score can be predicted by an equation incorporating standard morphoimmunohistologic variables (referred to as original Magee equation). Using a data set of 817 cases, we formulated three additional equations (referred to as new Magee equations 1, 2, and 3) to predict the recurrence score category for an independent set of 255 cases. The concordance between the risk category of Oncotype DX and our equations was 54.3%, 55.8%, 59.4%, and 54.4% for original Magee equation, new Magee equations 1, 2, and 3, respectively. When the intermediate category was eliminated, the concordance increased to 96.9%, 100%, 98.6%, and 98.7% for original Magee equation, new Magee equations 1, 2, and 3, respectively. Even when the estimated recurrence score fell in the intermediate category with any of the equations, the actual recurrence score was either intermediate or low in more than 80% of the cases. Any of the four equations can be used to estimate the recurrence score depending on available data. If the estimated recurrence score is clearly high or low, the oncologists should not expect a dramatically different result from Oncotype DX, and the Oncotype DX test may not be needed. Conversely, an Oncotype DX result that is dramatically different from what is expected based on standard morphoimmunohistologic variables should be thoroughly investigated.
NASA Astrophysics Data System (ADS)
Ramoelo, A.; Skidmore, A. K.; Cho, M. A.; Mathieu, R.; Heitkönig, I. M. A.; Dudeni-Tlhone, N.; Schlerf, M.; Prins, H. H. T.
2013-08-01
Grass nitrogen (N) and phosphorus (P) concentrations are direct indicators of rangeland quality and provide imperative information for sound management of wildlife and livestock. It is challenging to estimate grass N and P concentrations using remote sensing in the savanna ecosystems. These areas are diverse and heterogeneous in soil and plant moisture, soil nutrients, grazing pressures, and human activities. The objective of the study is to test the performance of non-linear partial least squares regression (PLSR) for predicting grass N and P concentrations through integrating in situ hyperspectral remote sensing and environmental variables (climatic, edaphic and topographic). Data were collected along a land use gradient in the greater Kruger National Park region. The data consisted of: (i) in situ-measured hyperspectral spectra, (ii) environmental variables and measured grass N and P concentrations. The hyperspectral variables included published starch, N and protein spectral absorption features, red edge position, narrow-band indices such as simple ratio (SR) and normalized difference vegetation index (NDVI). The results of the non-linear PLSR were compared to those of conventional linear PLSR. Using non-linear PLSR, integrating in situ hyperspectral and environmental variables yielded the highest grass N and P estimation accuracy (R2 = 0.81, root mean square error (RMSE) = 0.08, and R2 = 0.80, RMSE = 0.03, respectively) as compared to using remote sensing variables only, and conventional PLSR. The study demonstrates the importance of an integrated modeling approach for estimating grass quality which is a crucial effort towards effective management and planning of protected and communal savanna ecosystems.
Zheng, Xueying; Qin, Guoyou; Tu, Dongsheng
2017-02-19
Motivated by the analysis of quality of life data from a clinical trial on early breast cancer, we propose in this paper a generalized partially linear mean-covariance regression model for longitudinal proportional data, which are bounded in a closed interval. Cholesky decomposition of the covariance matrix for within-subject responses and generalized estimation equations are used to estimate unknown parameters and the nonlinear function in the model. Simulation studies are performed to evaluate the performance of the proposed estimation procedures. Our new model is also applied to analyze the data from the cancer clinical trial that motivated this research. In comparison with available models in the literature, the proposed model does not require specific parametric assumptions on the density function of the longitudinal responses and the probability function of the boundary values and can capture dynamic changes of time or other interested variables on both mean and covariance of the correlated proportional responses. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Whitlock, C. H., III
1977-01-01
Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.
NASA Astrophysics Data System (ADS)
Huttunen, Jani; Kokkola, Harri; Mielonen, Tero; Esa Juhani Mononen, Mika; Lipponen, Antti; Reunanen, Juha; Vilhelm Lindfors, Anders; Mikkonen, Santtu; Erkki Juhani Lehtinen, Kari; Kouremeti, Natalia; Bais, Alkiviadis; Niska, Harri; Arola, Antti
2016-07-01
In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during
Baird, Jim; Curry, Robin; Reid, Tim
2013-03-01
This article describes the development and application of a multiple linear regression model to identify how the key elements of waste and recycling infrastructure, namely container capacity and frequency of collection, affect the yield from municipal kerbside recycling programmes. The overall aim of the research was to gain an understanding of the factors affecting the yield from municipal kerbside recycling programmes in Scotland with an underlying objective to evaluate the efficacy of the model as a decision-support tool for informing the design of kerbside recycling programmes. The study isolates the principal kerbside collection service offered by all 32 councils across Scotland, eliminating those recycling programmes associated with flatted properties or multi-occupancies. The results of the regression analysis model have identified three principal factors which explain 80% of the variability in the average yield of the principal dry recyclate services: weekly residual waste capacity, number of materials collected and the weekly recycling capacity. The use of the model has been evaluated and recommendations made on ongoing methodological development and the use of the results in informing the design of kerbside recycling programmes. We hope that the research can provide insights for the further development of methods to optimise the design and operation of kerbside recycling programmes.
Lewin, M.D.; Sarasua, S.; Jones, P.A. . Div. of Health Studies)
1999-07-01
For the purpose of examining the association between blood lead levels and household-specific soil lead levels, the authors used a multivariate linear regression model to find a slope factor relating soil lead levels to blood lead levels. They used previously collected data from the Agency for Toxic Substances and Disease Registry's (ATSDR's) multisite lead and cadmium study. The data included in the blood lead measurements of 1,015 children aged 6--71 months, and corresponding household-specific environmental samples. The environmental samples included lead in soil, house dust, interior paint, and tap water. After adjusting for income, education or the parents, presence of a smoker in the household, sex, and dust lead, and using a double log transformation, they found a slope factor of 0.1388 with a 95% confidence interval of 0.09--0.19 for the dose-response relationship between the natural log of the soil lead level and the natural log of the blood lead level. The predicted blood lead level corresponding to a soil lead level of 500 mg/kg was 5.99 [micro]g/kg with a 95% prediction interval of 2.08--17.29. Predicted values and their corresponding prediction intervals varied by covariate level. The model shows that increased soil lead level is associated with elevated blood leads in children, but that predictions based on this regression model are subject to high levels of uncertainty and variability.
NASA Astrophysics Data System (ADS)
Bonelli, Maria Grazia; Ferrini, Mauro; Manni, Andrea
2016-12-01
The assessment of metals and organic micropollutants contamination in agricultural soils is a difficult challenge due to the extensive area used to collect and analyze a very large number of samples. With Dioxins and dioxin-like PCBs measurement methods and subsequent the treatment of data, the European Community advises the develop low-cost and fast methods allowing routing analysis of a great number of samples, providing rapid measurement of these compounds in the environment, feeds and food. The aim of the present work has been to find a method suitable to describe the relations occurring between organic and inorganic contaminants and use the value of the latter in order to forecast the former. In practice, the use of a metal portable soil analyzer coupled with an efficient statistical procedure enables the required objective to be achieved. Compared to Multiple Linear Regression, the Artificial Neural Networks technique has shown to be an excellent forecasting method, though there is no linear correlation between the variables to be analyzed.
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
NASA Astrophysics Data System (ADS)
Matahwa, H.; Ramiah, V.; Sanderson, R. D.
2008-10-01
Crystallization of calcium carbonate was performed in the presence of grafted polysaccharides, polyacrylamide (PAM) and polyacrylic acid (PAA). The grafted polysaccharides gave crystal morphologies that were different from the unmodified polysaccharides but similar to the ones given by homopolymers of the grafted chains. PAM-grafted α-cellulose gave rectangular platelets that aggregated to form 'spherical' crystals on the surface of the fiber, whereas PAA grafted α-cellulose gave spherical crystals on the surface of the fiber. X-ray diffraction (XRD) spectroscopy showed that PAM-grafted α-cellulose, PAM as well as the control (no polymeric additive) gave calcite crystals at both 25 and 80 °C. However, the PAA-grafted α-cellulose and PAA homopolymer gave calcite and vaterite crystals at 25 °C with calcite and aragonite crystals along with traces of vaterite being formed at 80 °C. The fiber surface coverage by these crystals was more on the acrylic- and acrylamide-grafted cellulose than on the ungrafted α-cellulose. The evolution of CaCO 3 polymorphs as well as crystal morphology in PAA-grafted starch was similar to that of PAA-grafted α-cellulose at the two temperatures employed.
El Dib, Regina; Gomaa, Huda; Ortiz, Alberto; Politei, Juan; Kapoor, Anil; Barreto, Fellype
2017-01-01
Background Anderson-Fabry disease (AFD) is an X-linked recessive inborn error of glycosphingolipid metabolism caused by a deficiency of alpha-galactosidase A. Renal failure, heart and cerebrovascular involvement reduce survival. A Cochrane review provided little evidence on the use of enzyme replacement therapy (ERT). We now complement this review through a linear regression and a pooled analysis of proportions from cohort studies. Objectives To evaluate the efficacy and safety of ERT for AFD. Materials and methods For the systematic review, a literature search was performed, from inception to March 2016, using Medline, EMBASE and LILACS. Inclusion criteria were cohort studies, patients with AFD on ERT or natural history, and at least one patient-important outcome (all-cause mortality, renal, cardiovascular or cerebrovascular events, and adverse events) reported. The pooled proportion and the confidence interval (CI) are shown for each outcome. Simple linear regressions for composite endpoints were performed. Results 77 cohort studies involving 15,305 participants proved eligible. The pooled proportions were as follows: a) for renal complications, agalsidase alfa 15.3% [95% CI 0.048, 0.303; I2 = 77.2%, p = 0.0005]; agalsidase beta 6% [95% CI 0.04, 0.07; I2 = not applicable]; and untreated patients 21.4% [95% CI 0.1522, 0.2835; I2 = 89.6%, p<0.0001]. Effect differences favored agalsidase beta compared to untreated patients; b) for cardiovascular complications, agalsidase alfa 28% [95% CI 0.07, 0.55; I2 = 96.7%, p<0.0001]; agalsidase beta 7% [95% CI 0.05, 0.08; I2 = not applicable]; and untreated patients 26.2% [95% CI 0.149, 0.394; I2 = 98.8%, p<0.0001]. Effect differences favored agalsidase beta compared to untreated patients; and c) for cerebrovascular complications, agalsidase alfa 11.1% [95% CI 0.058, 0.179; I2 = 70.5%, p = 0.0024]; agalsidase beta 3.5% [95% CI 0.024, 0.046; I2 = 0%, p = 0.4209]; and untreated patients 18.3% [95% CI 0.129, 0.245; I2 = 95% p < 0
Lewin, M D; Sarasua, S; Jones, P A
1999-07-01
For the purpose of examining the association between blood lead levels and household-specific soil lead levels, we used a multivariate linear regression model to find a slope factor relating soil lead levels to blood lead levels. We used previously collected data from the Agency for Toxic Substances and Disease Registry's (ATSDR's) multisite lead and cadmium study. The data included the blood lead measurements (0.5 to 40.2 microg/dL) of 1015 children aged 6-71 months, and corresponding household-specific environmental samples. The environmental samples included lead in soil (18.1-9980 mg/kg), house dust (5.2-71,000 mg/kg), interior paint (0-16.5 mg/cm2), and tap water (0.3-103 microg/L). After adjusting for income, education of the parents, presence of a smoker in the household, sex, and dust lead, and using a double log transformation, we found a slope factor of 0.1388 with a 95% confidence interval of 0.09-0.19 for the dose-response relationship between the natural log of the soil lead level and the natural log of the blood lead level. The predicted blood lead level corresponding to a soil lead level of 500 mg/kg was 5.99 microg/kg with a 95% prediction interval of 2. 08-17.29. Predicted values and their corresponding prediction intervals varied by covariate level. The model shows that increased soil lead level is associated with elevated blood leads in children, but that predictions based on this regression model are subject to high levels of uncertainty and variability.
Boischio, A A; Henshel, D S
2000-06-01
This research is focused on prenatal and early postnatal mercury (Hg) exposure among the riverside people along the Upper Madeira river in the Amazon. Linear regression models were developed to predict the hair Hg concentration in infants. The independent variables included in the model of Group 1 (87 pairs of mothers and their infants) were the average maternal hair Hg concentration and maternal age. Group 2 (31 pairs) included maternal segmental hair Hg concentrations. For the segmental hair Hg analysis over time, it was assumed that hair grows at a rate of 11 cm per month. Thus, information on the timing of the dates of pregnancy and breast feeding from the birth history was used to cut the hair strands into segments, making them correspond to the mother's reproductive stage of life (31 pairs of mothers and their infants). Breast milk Hg concentration results were included with segmental and average maternal hair Hg concentration values (22 and 44 pairs of mothers and their infants, respectively). The models including the breast milk Hg concentration indicated that 61 and 55% of the variability of the infant hair Hg concentrations were due to the independent variables: segmental maternal hair Hg with breast milk Hg and average maternal hair Hg with breast milk Hg, respectively. The regression coefficients were in the range of 0.19 to 0.90, and P values were in the range of 0.0001 to 0.1490. Further recommendations include fish advisories to prevent critical Hg exposures during reproductive life and investigation of neurobehavioral performance of this study population.
Welp, Gerhard; Thiel, Michael
2017-01-01
Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties–sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen–in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models–multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)–were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of
Forkuor, Gerald; Hounkpatin, Ozias K L; Welp, Gerhard; Thiel, Michael
2017-01-01
Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness
Anders, Christian; Urbassek, Herbert M.; Johnson, Robert E.
2004-10-15
Using molecular-dynamics simulation, we study sputtering of a condensed-gas solid induced by the impact of atomic clusters with sizes 1{<=}n{<=}10{sup 4}. Above a nonlinear onset regime, we find a linear increase of the sputter yield Y with the total energy E of the bombarding cluster. The fitting coefficients in the linear regime depend only on the cluster size n such that for fixed bombardment energy, sputtering decreases with increasing cluster size n. We find that to a good approximation the sputter yield in this regime obeys an additivity rule in cluster size n such that doubling the cluster size at the same cluster velocity amounts to doubling the sputter yield. The sputter-limiting energy {epsilon}{sub s} is introduced which separates erosion ({epsilon}>{epsilon}{sub s}) from growth ({epsilon}<{epsilon}{sub s}) under cluster impact.
NASA Astrophysics Data System (ADS)
Bernales, A. M.; Antolihao, J. A.; Samonte, C.; Campomanes, F.; Rojas, R. J.; dela Serna, A. M.; Silapan, J.
2016-06-01
The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC) and land surface temperature (LST). Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric "Effective mesh size" was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas) and looking for common predictors between LSTs of these two different farming periods.
Jalali-Heravi, M; Parastar, F
2000-12-01
A new series of six comprehensive descriptors that represent different features of the gas-liquid partition coefficient, K(L), for commonly used stationary phases is developed. These descriptors can be considered as counterparts of the parameters in the Abraham solvatochromic model of solution. A separate multiple linear regression (MLR) model was developed by using the six descriptors for each stationary phase of poly(ethylene glycol adipate) (EGAD), N,N,N',N'-tetrakis(2-hydroxypropyl) ethylenediamine (THPED), poly(ethylene glycol) (Ucon 50 HB 660) (U50HB), di(2-ethylhexyl)phosphoric acid (DEHPA) and tetra-n-butylammonium N,N-(bis-2-hydroxylethyl)-2-aminoethanesulfonate (QBES). The results obtained using these models are in good agreement with the experiment and with the results of the empirical model based on the solvatochromic theory. A 6-6-5 neural network was developed using the descriptors appearing in the MLR models as inputs. Comparison of the mean square errors (MSEs) shows the superiority of the artificial neural network (ANN) over that of the MLR. This indicates that the retention behavior of the molecules on different columns show some nonlinear characteristics. The experimental solvatochromic parameters proposed by Abraham can be replaced by the calculated descriptors in this work.
NASA Astrophysics Data System (ADS)
McCormick, Patrick W.; Lewis, Gary D.; Dujovny, Manuel; Ausman, James I.; Stewart, Mick; Widman, Ronald A.
1992-05-01
Near infrared light generated by specialized instrumentation was passed through artificially oxygenated human blood during simultaneous sampling by a co-oximeter. Characteristic absorption spectra were analyzed to calculate the ratio of oxygenated to reduced hemoglobin. A positive linear regression fit between diffuse transmission oximetry and measured blood oxygenation over the range 23% to 99% (r2 equals .98, p < .001) was noted. The same technology was used to pass two channels of light through the scalp of brain-injured patients with prolonged, decreased level of consciousness in a tertiary care neuroscience ICU. Transmission data were collected with gross superficial-to-deep spatial resolution. Saturation calculation based on the deep signal was observed in the patient over time. The procedure was able to be performed clinically without difficulty; rSO2 values recorded continuously demonstrate the usefulness of the technique. Using the same instrumentation, arterial input and cerebral response functions, generated by IV tracer bolus, were deconvoluted to measure mean cerebral transit time. Date collected over time provided a sensitive index of changes in cerebral blood flow as a result of therapeutic maneuvers.
Caselli, Maurizio; Mangone, Annarosa; Paolillo, Paola; Traini, Angela
2002-01-01
The pKa of 3',3",5',5"tetrabromo-m-cresolsulfonephtalein (Bromocresol Green) and o-cresolsulphonephtalein (Cresol Red) was spectrophotometrically measured in a water/AOT/isooctane microemulsion in the presence of a series of buffers carrying different charges at different water/surfactant ratios. Extended Principal Component Analysis was used for a precise determination of the apparent pKa and of the spectra of the acid and base forms of the dye. The apparent pKa of dyes in water-in-oil microemulsions depends on the charge of the acid and base forms of the buffers present in the water pool. Combination with multiple linear regression increases the precision. Results are discussed taking into account the profile of the electrostatic potential in the water pool and the possible partition of the indicator between the aqueous core and the surfactant. The pKa corrected for these effects are independent of w0 and are close to the value of the pKa in bulk water. On the basis of a tentative hypothesis it is possible to calculate the true pKa of the buffer in the pool.
Golmohammadi, Hassan
2009-11-30
A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models.
Yu, Jianwei; Liu, Juan; An, Wei; Wang, Yongjing; Zhang, Junzhi; Wei, Wei; Su, Ming; Yang, Min
2015-01-01
A total of 86 source water samples from 38 cities across major watersheds of China were collected for a bromide (Br(-)) survey, and the bromate (BrO3 (-)) formation potentials (BFPs) of 41 samples with Br(-) concentration >20 μg L(-1) were evaluated using a batch ozonation reactor. Statistical analyses indicated that higher alkalinity, hardness, and pH of water samples could lead to higher BFPs, with alkalinity as the most important factor. Based on the survey data, a multiple linear regression (MLR) model including three parameters (alkalinity, ozone dose, and total organic carbon (TOC)) was established with a relatively good prediction performance (model selection criterion = 2.01, R (2) = 0.724), using logarithmic transformation of the variables. Furthermore, a contour plot was used to interpret the influence of alkalinity and TOC on BrO3 (-) formation with prediction accuracy as high as 71 %, suggesting that these two parameters, apart from ozone dosage, were the most important ones affecting the BFPs of source waters with Br(-) concentration >20 μg L(-1). The model could be a useful tool for the prediction of the BFPs of source water.
Schwantes-An, Tae-Hwi; Sung, Heejong; Sabourin, Jeremy A; Justice, Cristina M; Sorant, Alexa J M; Wilson, Alexander F
2016-01-01
In this study, the effects of (a) the minor allele frequency of the single nucleotide variant (SNV), (b) the degree of departure from normality of the trait, and (c) the position of the SNVs on type I error rates were investigated in the Genetic Analysis Workshop (GAW) 19 whole exome sequence data. To test the distribution of the type I error rate, 5 simulated traits were considered: standard normal and gamma distributed traits; 2 transformed versions of the gamma trait (log10 and rank-based inverse normal transformations); and trait Q1 provided by GAW 19. Each trait was tested with 313,340 SNVs. Tests of association were performed with simple linear regression and average type I error rates were determined for minor allele frequency classes. Rare SNVs (minor allele frequency < 0.05) showed inflated type I error rates for non-normally distributed traits that increased as the minor allele frequency decreased. The inflation of average type I error rates increased as the significance threshold decreased. Normally distributed traits did not show inflated type I error rates with respect to the minor allele frequency for rare SNVs. There was no consistent effect of transformation on the uniformity of the distribution of the location of SNVs with a type I error.
Dikaios, Nikolaos; Atkinson, David; Tudisca, Chiara; Purpura, Pierpaolo; Forster, Martin; Ahmed, Hashim; Beale, Timothy; Emberton, Mark; Punwani, Shonit
2017-03-01
The aim of this work is to compare Bayesian Inference for nonlinear models with commonly used traditional non-linear regression (NR) algorithms for estimating tracer kinetics in Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI). The algorithms are compared in terms of accuracy, and reproducibility under different initialization settings. Further it is investigated how a more robust estimation of tracer kinetics affects cancer diagnosis. The derived tracer kinetics from the Bayesian algorithm were validated against traditional NR algorithms (i.e. Levenberg-Marquardt, simplex) in terms of accuracy on a digital DCE phantom and in terms of goodness-of-fit (Kolmogorov-Smirnov test) on ROI-based concentration time courses from two different patient cohorts. The first cohort consisted of 76 men, 20 of whom had significant peripheral zone prostate cancer (any cancer-core-length (CCL) with Gleason>3+3 or any-grade with CCL>=4mm) following transperineal template prostate mapping biopsy. The second cohort consisted of 9 healthy volunteers and 24 patients with head and neck squamous cell carcinoma. The diagnostic ability of the derived tracer kinetics was assessed with receiver operating characteristic area under curve (ROC AUC) analysis. The Bayesian algorithm accurately recovered the ground-truth tracer kinetics for the digital DCE phantom consistently improving the Structural Similarity Index (SSIM) across the 50 different initializations compared to NR. For optimized initialization, Bayesian did not improve significantly the fitting accuracy on both patient cohorts, and it only significantly improved the ve ROC AUC on the HN population from ROC AUC=0.56 for the simplex to ROC AUC=0.76. For both cohorts, the values and the diagnostic ability of tracer kinetic parameters estimated with the Bayesian algorithm weren't affected by their initialization. To conclude, the Bayesian algorithm led to a more accurate and reproducible quantification of tracer kinetic
Hu, L; Liang, M; Mouraux, A; Wise, R G; Hu, Y; Iannetti, G D
2011-12-01
Across-trial averaging is a widely used approach to enhance the signal-to-noise ratio (SNR) of event-related potentials (ERPs). However, across-trial variability of ERP latency and amplitude may contain physiologically relevant information that is lost by across-trial averaging. Hence, we aimed to develop a novel method that uses 1) wavelet filtering (WF) to enhance the SNR of ERPs and 2) a multiple linear regression with a dispersion term (MLR(d)) that takes into account shape distortions to estimate the single-trial latency and amplitude of ERP peaks. Using simulated ERP data sets containing different levels of noise, we provide evidence that, compared with other approaches, the proposed WF+MLR(d) method yields the most accurate estimate of single-trial ERP features. When applied to a real laser-evoked potential data set, the WF+MLR(d) approach provides reliable estimation of single-trial latency, amplitude, and morphology of ERPs and thereby allows performing meaningful correlations at single-trial level. We obtained three main findings. First, WF significantly enhances the SNR of single-trial ERPs. Second, MLR(d) effectively captures and measures the variability in the morphology of single-trial ERPs, thus providing an accurate and unbiased estimate of their peak latency and amplitude. Third, intensity of pain perception significantly correlates with the single-trial estimates of N2 and P2 amplitude. These results indicate that WF+MLR(d) can be used to explore the dynamics between different ERP features, behavioral variables, and other neuroimaging measures of brain activity, thus providing new insights into the functional significance of the different brain processes underlying the brain responses to sensory stimuli.
Esbaugh, A J; Brix, K V; Mager, E M; De Schamphelaere, K; Grosell, M
2012-03-01
The current study examined the chronic toxicity of lead (Pb) to three invertebrate species: the cladoceran Ceriodaphnia dubia, the snail Lymnaea stagnalis and the rotifer Philodina rapida. The test media consisted of natural waters from across North America, varying in pertinent water chemistry parameters including dissolved organic carbon (DOC), calcium, pH and total CO(2). Chronic toxicity was assessed using reproductive endpoints for C. dubia and P. rapida while growth was assessed for L. stagnalis, with chronic toxicity varying markedly according to water chemistry. A multi-linear regression (MLR) approach was used to identify the relative importance of individual water chemistry components in predicting chronic Pb toxicity for each species. DOC was an integral component of MLR models for C. dubia and L. stagnalis, but surprisingly had no predictive impact on chronic Pb toxicity for P. rapida. Furthermore, sodium and total CO(2) were also identified as important factors affecting C. dubia toxicity; no other factors were predictive for L. stagnalis. The Pb toxicity of P. rapida was predicted by calcium and pH. The predictive power of the C. dubia and L. stagnalis MLR models was generally similar to that of the current C. dubia BLM, with R(2) values of 0.55 and 0.82 for the respective MLR models, compared to 0.45 and 0.79 for the respective BLMs. In contrast the BLM poorly predicted P. rapida toxicity (R(2)=0.19), as compared to the MLR (R(2)=0.92). The cross species variability in the effects of water chemistry, especially with respect to rotifers, suggests that cross species modeling of invertebrate chronic Pb toxicity using a C. dubia model may not always be appropriate.
Callén, M S; López, J M; Mastral, A M
2010-08-15
The estimation of benzo(a)pyrene (BaP) concentrations in ambient air is very important from an environmental point of view especially with the introduction of the Directive 2004/107/EC and due to the carcinogenic character of this pollutant. A sampling campaign of particulate matter less or equal than 10 microns (PM10) carried out during 2008-2009 in four locations of Spain was collected to determine experimentally BaP concentrations by gas chromatography mass-spectrometry mass-spectrometry (GC-MS-MS). Multivariate linear regression models (MLRM) were used to predict BaP air concentrations in two sampling places, taking PM10 and meteorological variables as possible predictors. The model obtained with data from two sampling sites (all sites model) (R(2)=0.817, PRESS/SSY=0.183) included the significant variables like PM10, temperature, solar radiation and wind speed and was internally and externally validated. The first validation was performed by cross validation and the last one by BaP concentrations from previous campaigns carried out in Zaragoza from 2001-2004. The proposed model constitutes a first approximation to estimate BaP concentrations in urban atmospheres with very good internal prediction (Q(CV)(2)=0.813, PRESS/SSY=0.187) and with the maximal external prediction for the 2001-2002 campaign (Q(ext)(2)=0.679 and PRESS/SSY=0.321) versus the 2001-2004 campaign (Q(ext)(2)=0.551, PRESS/SSY=0.449).
NASA Astrophysics Data System (ADS)
Barbu, N.; Cuculeanu, V.; Stefan, S.
2016-10-01
The aim of this study is to investigate the relationship between the frequency of very warm days (TX90p) in Romania and large-scale atmospheric circulation for winter (December-February) and summer (June-August) between 1962 and 2010. In order to achieve this, two catalogues from COST733Action were used to derive daily circulation types. Seasonal occurrence frequencies of the circulation types were calculated and have been utilized as predictors within the multiple linear regression model (MLRM) for the estimation of winter and summer TX90p values for 85 synoptic stations covering the entire Romania. A forward selection procedure has been utilized to find adequate predictor combinations and those predictor combinations were tested for collinearity. The performance of the MLRMs has been quantified based on the explained variance. Furthermore, the leave-one-out cross-validation procedure was applied and the root-mean-squared error skill score was calculated at station level in order to obtain reliable evidence of MLRM robustness. From this analysis, it can be stated that the MLRM performance is higher in winter compared to summer. This is due to the annual cycle of incoming insolation and to the local factors such as orography and surface albedo variations. The MLRM performances exhibit distinct variations between regions with high performance in wintertime for the eastern and southern part of the country and in summertime for the western part of the country. One can conclude that the MLRM generally captures quite well the TX90p variability and reveals the potential for statistical downscaling of TX90p values based on circulation types.
Lee, Paul H.
2016-01-01
Healthy adults are advised to perform at least 150 min of moderate-intensity physical activity weekly, but this advice is based on studies using self-reports of questionable validity. This study examined the dose-response relationship of accelerometer-measured physical activity and sedentary behaviors on all-cause mortality using segmented Cox regression to empirically determine the break-points of the dose-response relationship. Data from 7006 adult participants aged 18 or above in the National Health and Nutrition Examination Survey waves 2003–2004 and 2005–2006 were included in the analysis and linked with death certificate data using a probabilistic matching approach in the National Death Index through December 31, 2011. Physical activity and sedentary behavior were measured using ActiGraph model 7164 accelerometer over the right hip for 7 consecutive days. Each minute with accelerometer count <100; 1952–5724; and ≥5725 were classified as sedentary, moderate-intensity physical activity, and vigorous-intensity physical activity, respectively. Segmented Cox regression was used to estimate the hazard ratio (HR) of time spent in sedentary behaviors, moderate-intensity physical activity, and vigorous-intensity physical activity and all-cause mortality, adjusted for demographic characteristics, health behaviors, and health conditions. Data were analyzed in 2016. During 47,119 person-year of follow-up, 608 deaths occurred. Each additional hour per day of sedentary behaviors was associated with a HR of 1.15 (95% CI 1.01, 1.31) among participants who spend at least 10.9 h per day on sedentary behaviors, and each additional minute per day spent on moderate-intensity physical activity was associated with a HR of 0.94 (95% CI 0.91, 0.96) among participants with daily moderate-intensity physical activity ≤14.1 min. Associations of moderate physical activity and sedentary behaviors on all-cause mortality were independent of each other. To conclude, evidence from
Liu, Tong-Zu; Xu, Chang; Rota, Matteo; Cai, Hui; Zhang, Chao; Shi, Ming-Jun; Yuan, Rui-Xia; Weng, Hong; Meng, Xiang-Yu; Kwong, Joey S W; Sun, Xin
2017-04-01
Approximately 27-37% of the general population experience prolonged sleep duration and 12-16% report shortened sleep duration. However, prolonged or shortened sleep duration may be associated with serious health problems. A comprehensive, flexible, non-linear meta-regression with restricted cubic spline (RCS) was used to investigate the dose-response relationship between sleep duration and all-cause mortality in adults. Medline (Ovid), Embase, EBSCOhost-PsycINFO, and EBSCOhost-CINAHL Plus databases, reference lists of relevant review articles, and included studies were searched up to Nov. 29, 2015. Prospective cohort studies investigating the association between sleep duration and all-cause mortality in adults with at least three categories of sleep duration were eligible for inclusion. We eventually included in our study 40 cohort studies enrolling 2,200,425 participants with 271,507 deaths. A J-shaped association between sleep duration and all-cause mortality was present: compared with 7 h of sleep (reference for 24-h sleep duration), both shortened and prolonged sleep durations were associated with increased risk of all-cause mortality (4 h: relative risk [RR] = 1.05; 95% confidence interval [CI] = 1.02-1.07; 5 h: RR = 1.06; 95% CI = 1.03-1.09; 6 h: RR = 1.04; 95% CI = 1.03-1.06; 8 h: RR = 1.03; 95% CI = 1.02-1.05; 9 h: RR = 1.13; 95% CI = 1.10-1.16; 10 h: RR = 1.25; 95% CI = 1.22-1.28; 11 h: RR = 1.38; 95% CI = 1.33-1.44; n = 29; P < 0.01 for non-linear test). With regard to the night-sleep duration, prolonged night-sleep duration was associated with increased all-cause mortality (8 h: RR = 1.01; 95% CI = 0.99-1.02; 9 h: RR = 1.08; 95% CI = 1.05-1.11; 10 h: RR = 1.24; 95% CI = 1.21-1.28; n = 13; P < 0.01 for non-linear test). Subgroup analysis showed females with short sleep duration a day (<7 h) were at high risk of all-cause mortality (4 h: RR = 1.07; 95% CI = 1.02-1.13; 5 h: RR = 1.08; 95
Improved Regression Calibration
ERIC Educational Resources Information Center
Skrondal, Anders; Kuha, Jouni
2012-01-01
The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration…
NASA Astrophysics Data System (ADS)
Cotos-Yáñez, Tomas R.; Rodríguez-Rajo, F. J.; Jato, M. V.
Betula pollen is a common cause of pollinosis in localities in NW Spain and between 13% and 60% of individuals who are immunosensitive to pollen grains respond positively to its allergens. It is important in the case of all such people to be able to predict pollen concentrations in advance. We therefore undertook an aerobiological study in the city of Vigo (Pontevedra, Spain) from 1995 to 2001, using a Hirst active-impact pollen trap (VPPS 2000) situated in the city centre. Vigo presents a temperate maritime climate with a mean annual temperature of 14.9 °C and 1,412 mm annual total precipitation. This paper analyses two ways of quantifying the prediction of pollen concentration: first by means of a generalized additive regression model with the object of predicting whether the series of interest exceeds a certain threshold; second using a partially linear model to obtain specific prediction values for pollen grains. Both models use a self-explicative part and another formed by exogenous meteorological factors. The models were tested with data from 2001 (year in which the total precipitation registered was almost twice the climatological average overall during the flowering period), which were not used in formulating the models. A highly satisfactory classification and good forecasting results were achieved with the first and second approaches respectively. The estimated line taking into account temperature and a calm S-SW wind, corresponds to the real line recorded during 2001, which gives us an idea of the proposed model's validity.
Khan, M K I; Naznin, M
2013-10-01
breeds were lowered after fitting the linear regression. The co-efficient of determination (R(2)) of male and female black Bengal and Jamunapari goats kids similar.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A
2013-05-01
This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.
Interaction Models for Functional Regression
USSET, JOSEPH; STAICU, ANA-MARIA; MAITY, ARNAB
2015-01-01
A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data. PMID:26744549
Technology Transfer Automated Retrieval System (TEKTRAN)
Geospatial measurements of ancillary sensor data, such as bulk soil electrical conductivity or remotely sensed imagery data, are commonly used to characterize spatial variation in soil or crop properties. Geostatistical techniques like kriging with external drift or regression kriging are often use...
NASA Technical Reports Server (NTRS)
Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.
1976-01-01
A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
Xu, Xu; Chang, Chien-Chi; Lu, Ming-Lun
2012-01-01
Previous studies have indicated that cumulative L5/S1 joint load is a potential risk factor for low back pain. The assessment of cumulative L5/S1 joint load during a field study is challenging due to the difficulty of continuously monitoring the dynamic joint load. This study proposes two regression models predicting cumulative dynamic L5/S1 joint moment based on the static L5/S1 joint moment of a lifting task at lift-off and set-down and the lift duration. Twelve men performed lifting tasks at varying lifting ranges and asymmetric angles in a laboratory environment. The cumulative L5/S1 joint moment was calculated from continuous dynamic L5/S1 moments as the reference for comparison. The static L5/S1 joint moments at lift-off and set-down were measured for the two regression models. The prediction error of the cumulative L5/S1 joint moment was 21 ± 14 Nm × s (12% of the measured cumulative L5/S1 joint moment) and 14 ± 9 Nm × s (8%) for the first and the second models, respectively. Practitioner Summary: The proposed regression models may provide a practical approach for predicting the cumulative dynamic L5/S1 joint loading of a lifting task for field studies since it requires only the lifting duration and the static moments at the lift-off and/or set-down instants of the lift.
Keith, Scott W.; Allison, David B.
2014-01-01
This paper details the design, evaluation, and implementation of a framework for detecting and modeling non-linearity between a binary outcome and a continuous predictor variable adjusted for covariates in complex samples. The framework provides familiar-looking parameterizations of output in terms of linear slope coefficients and odds ratios. Estimation methods focus on maximum likelihood optimization of piecewise linear free-knot splines formulated as B-splines. Correctly specifying the optimal number and positions of the knots improves the model, but is marked by computational intensity and numerical instability. Our inference methods utilize both parametric and non-parametric bootstrapping. Unlike other non-linear modeling packages, this framework is designed to incorporate multistage survey sample designs common to nationally representative datasets. We illustrate the approach and evaluate its performance in specifying the correct number of knots under various conditions with an example using body mass index (BMI, kg/m2) and the complex multistage sampling design from the Third National Health and Nutrition Examination Survey to simulate binary mortality outcomes data having realistic non-linear sample-weighted risk associations with BMI. BMI and mortality data provide a particularly apt example and area of application since BMI is commonly recorded in large health surveys with complex designs, often categorized for modeling, and non-linearly related to mortality. When complex sample design considerations were ignored, our method was generally similar to or more accurate than two common model selection procedures, Schwarz’s Bayesian Information Criterion (BIC) and Akaike’s Information Criterion (AIC), in terms of correctly selecting the correct number of knots. Our approach provided accurate knot selections when complex sampling weights were incorporated, while AIC and BIC were not effective under these conditions. PMID:25610831
Tomlinson, Sean
2016-04-01
The calculation and comparison of physiological characteristics of thermoregulation has provided insight into patterns of ecology and evolution for over half a century. Thermoregulation has typically been explored using linear techniques; I explore the application of non-linear scaling to more accurately calculate and compare characteristics and thresholds of thermoregulation, including the basal metabolic rate (BMR), peak metabolic rate (PMR) and the lower (Tlc) and upper (Tuc) critical limits to the thermo-neutral zone (TNZ) for Australian rodents. An exponentially-modified logistic function accurately characterised the response of metabolic rate to ambient temperature, while evaporative water loss was accurately characterised by a Michaelis-Menten function. When these functions were used to resolve unique parameters for the nine species studied here, the estimates of BMR and TNZ were consistent with the previously published estimates. The approach resolved differences in rates of metabolism and water loss between subfamilies of Australian rodents that haven't been quantified before. I suggest that non-linear scaling is not only more effective than the established segmented linear techniques, but also is more objective. This approach may allow broader and more flexible comparison of characteristics of thermoregulation, but it needs testing with a broader array of taxa than those used here.
Granato, Gregory E.
2006-01-01
The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and
NASA Astrophysics Data System (ADS)
Naguib, Ibrahim A.; Abdelaleem, Eglal A.; Draz, Mohammed E.; Zaazaa, Hala E.
2014-09-01
Partial least squares regression (PLSR) and support vector regression (SVR) are two popular chemometric models that are being subjected to a comparative study in the presented work. The comparison shows their characteristics via applying them to analyze Hydrochlorothiazide (HCZ) and Benazepril hydrochloride (BZ) in presence of HCZ impurities; Chlorothiazide (CT) and Salamide (DSA) as a case study. The analysis results prove to be valid for analysis of the two active ingredients in raw materials and pharmaceutical dosage form through handling UV spectral data in range (220-350 nm). For proper analysis a 4 factor 4 level experimental design was established resulting in a training set consisting of 16 mixtures containing different ratios of interfering species. An independent test set consisting of 8 mixtures was used to validate the prediction ability of the suggested models. The results presented indicate the ability of mentioned multivariate calibration models to analyze HCZ and BZ in presence of HCZ impurities CT and DSA with high selectivity and accuracy of mean percentage recoveries of (101.01 ± 0.80) and (100.01 ± 0.87) for HCZ and BZ respectively using PLSR model and of (99.78 ± 0.80) and (99.85 ± 1.08) for HCZ and BZ respectively using SVR model. The analysis results of the dosage form were statistically compared to the reference HPLC method with no significant differences regarding accuracy and precision. SVR model gives more accurate results compared to PLSR model and show high generalization ability, however, PLSR still keeps the advantage of being fast to optimize and implement.
Bardsley, W G; McGinlay, P B
1987-05-21
Computer fitting of binding data is discussed and it is concluded that the main problem is the choice of starting estimates and internal scaling parameters, not the optimization software. Solving linear overdetermined systems of equations for starting estimates is investigated. A function, Q, is introduced to study model discrimination with binding isotherms and the behaviour of Q as a function of model parameters is calculated for the case of 2 and 3 sites. The power function of the F test is estimated for models with 2 to 5 binding sites and necessary constraints on parameters for correct model discrimination are given. The sampling distribution of F test statistics is compared to an exact F distribution using the Chi-squared and Kolmogorov-Smirnov tests. For low order modes (n less than 3) the F test statistics are approximately F distributed but for higher order models the test statistics are skewed to the left of the F distribution. The parameter covariance matrix obtained by inverting the Hessian matrix of the objective function is shown to be a good approximation to the estimate obtained by Monte Carlo sampling for low order models (n less than 3). It is concluded that analysis of up to 2 or 3 binding sites presents few problems and linear, normal statistical results are valid. To identify correctly 4 sites is much more difficult, requiring very precise data and extreme parameter values. Discrimination of 5 from 4 sites is an upper limit to the usefulness of the F test.
Keith, Scott W; Allison, David B
2014-09-29
This paper details the design, evaluation, and implementation of a framework for detecting and modeling nonlinearity between a binary outcome and a continuous predictor variable adjusted for covariates in complex samples. The framework provides familiar-looking parameterizations of output in terms of linear slope coefficients and odds ratios. Estimation methods focus on maximum likelihood optimization of piecewise linear free-knot splines formulated as B-splines. Correctly specifying the optimal number and positions of the knots improves the model, but is marked by computational intensity and numerical instability. Our inference methods utilize both parametric and nonparametric bootstrapping. Unlike other nonlinear modeling packages, this framework is designed to incorporate multistage survey sample designs common to nationally representative datasets. We illustrate the approach and evaluate its performance in specifying the correct number of knots under various conditions with an example using body mass index (BMI; kg/m(2)) and the complex multi-stage sampling design from the Third National Health and Nutrition Examination Survey to simulate binary mortality outcomes data having realistic nonlinear sample-weighted risk associations with BMI. BMI and mortality data provide a particularly apt example and area of application since BMI is commonly recorded in large health surveys with complex designs, often categorized for modeling, and nonlinearly related to mortality. When complex sample design considerations were ignored, our method was generally similar to or more accurate than two common model selection procedures, Schwarz's Bayesian Information Criterion (BIC) and Akaike's Information Criterion (AIC), in terms of correctly selecting the correct number of knots. Our approach provided accurate knot selections when complex sampling weights were incorporated, while AIC and BIC were not effective under these conditions.
Gebrehiwot, Tesfay Gebregzabher; San Sebastian, Miguel; Edin, Kerstin; Goicolea, Isabel
2015-01-01
Background In 2003, the Ethiopian Ministry of Health established the Health Extension Program (HEP), with the goal of improving access to health care and health promotion activities in rural areas of the country. This paper aims to assess the association of the HEP with improved utilization of maternal health services in Northern Ethiopia using institution-based retrospective data. Methods Average quarterly total attendances for antenatal care (ANC), delivery care (DC) and post-natal care (PNC) at health posts and health care centres were studied from 2002 to 2012. Regression analysis was applied to two models to assess whether trends were statistically significant. One model was used to estimate the level and trend changes associated with the immediate period of intervention, while changes related to the post-intervention period were estimated by the other. Results The total number of consultations for ANC, DC and PNC increased constantly, particularly after the late-intervention period. Increases were higher for ANC and PNC at health post level and for DC at health centres. A positive statistically significant upward trend was found for DC and PNC in all facilities (p<0.01). The positive trend was also present in ANC at health centres (p = 0.04), but not at health posts. Conclusion Our findings revealed an increase in the use of antenatal, delivery and post-natal care after the introduction of the HEP. We are aware that other factors, that we could not control for, might be explaining that increase. The figures for DC and PNC are however low and more needs to be done in order to increase the access to the health care system as well as the demand for these services by the population. Strengthening of the health information system in the region needs also to be prioritized. PMID:26218074
Caravaggi, Paolo; Leardini, Alberto; Giacomozzi, Claudia
2016-10-03
Plantar load can be considered as a measure of the foot ability to transmit forces at the foot/ground, or foot/footwear interface during ambulatory activities via the lower limb kinematic chain. While morphological and functional measures have been shown to be correlated with plantar load, no exhaustive data are currently available on the possible relationships between range of motion of foot joints and plantar load regional parameters. Joints' kinematics from a validated multi-segmental foot model were recorded together with plantar pressure parameters in 21 normal-arched healthy subjects during three barefoot walking trials. Plantar pressure maps were divided into six anatomically-based regions of interest associated to corresponding foot segments. A stepwise multiple regression analysis was performed to determine the relationships between pressure-based parameters, joints range of motion and normalized walking speed (speed/subject height). Sagittal- and frontal-plane joint motion were those most correlated to plantar load. Foot joints' range of motion and normalized walking speed explained between 6% and 43% of the model variance (adjusted R(2)) for pressure-based parameters. In general, those joints' presenting lower mobility during stance were associated to lower vertical force at forefoot and to larger mean and peak pressure at hindfoot and forefoot. Normalized walking speed was always positively correlated to mean and peak pressure at hindfoot and forefoot. While a large variance in plantar pressure data is still not accounted for by the present models, this study provides statistical corroboration of the close relationship between joint mobility and plantar pressure during stance in the normal healthy foot.
Williams, D Keith; Chadwick, M Ashley; Williams, Taufika Islam; Muddiman, David C
2008-12-01
Operation of any mass spectrometer requires implementation of mass calibration laws to translate experimentally measured physical quantities into a m/z range. While internal calibration in Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS) offers several attractive features, including exposure of calibrant and analyte ions to identical experimental conditions (e.g. space charge), external calibration affords simpler pulse sequences and higher throughput. The automatic gain control method used in hybrid linear trap quadrupole (LTQ) FT-ICR-MS to consistently obtain the same ion population is not readily amenable to matrix-assisted laser desorption/ionization (MALDI) FT-ICR-MS, due to the heterogeneous nature and poor spot-to-spot reproducibility of MALDI. This can be compensated for by taking external calibration laws into account that consider magnetic and electric fields, as well as relative and total ion abundances. Herein, an evaluation of external mass calibration laws applied to MALDI-FT-ICR-MS is performed to achieve higher mass measurement accuracy (MMA).
Marami Milani, Mohammad Reza; Hense, Andreas; Rahmani, Elham; Ploeger, Angelika
2016-01-01
This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new), and respiratory rate predictor RRP) with three main components of cow’s milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001) with R2 (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available. PMID:28231147
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Rodriguez-Gonzalez, Pablo; Bouchet, Sylvain; Monperrus, Mathilde; Tessier, Emmanuel; Amouroux, David
2013-03-01
The fate of mercury (Hg) and tin (Sn) compounds in ecosystems is strongly determined by their alkylation/dealkylation pathways. However, the experimental determination of those transformations is still not straightforward and methodologies need to be refined. The purpose of this work is the development of a comprehensive and adaptable tool for an accurate experimental assessment of specific formation/degradation yields and half-lives of elemental species in different aquatic environments. The methodology combines field incubations of coastal waters and surface sediments with the addition of species-specific isotopically enriched tracers and a mathematical approach based on the deconvolution of isotopic patterns. The method has been applied to the study of the environmental reactivity of Hg and Sn compounds in coastal water and surface sediment samples collected in two different coastal ecosystems of the South French Atlantic Coast (Arcachon Bay and Adour Estuary). Both the level of isotopically enriched species and the spiking solution composition were found to alter dibutyltin and monomethylmercury degradation yields, while no significant changes were measurable for tributyltin and Hg(II). For butyltin species, the presence of light was found to be the main source of degradation and removal of these contaminants from surface coastal environments. In contrast, photomediated processes do not significantly influence either the methylation of mercury or the demethylation of methylmercury. The proposed method constitutes an advancement from the previous element-specific isotopic tracers' approaches, which allows for instance to discriminate the extent of net and oxidative Hg demethylation and to identify which debutylation step is controlling the environmental persistence of butyltin compounds.
Schmid, Matthias; Wickler, Florian; Maloney, Kelly O.; Mitchell, Richard; Fenske, Nora; Mayr, Andreas
2013-01-01
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures. PMID:23626706
Understanding poisson regression.
Hayat, Matthew J; Higgins, Melinda
2014-04-01
Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes.
Zane, P A; Brindle, S D; Gause, D O; O'Buck, A J; Raghavan, P R; Tripp, S L
1990-09-01
The relationship between the physicochemical characteristics of 27 new drug candidates and their distribution into the melanin-containing structure of the rat eye, the uveal tract, was examined. Tissue distribution data were obtained from whole-body autoradiograms of pigmented Long-Evans rats sacrificed at 5 min and 96 hr after dosing. The physicochemical parameters considered include molecular weight, pKa, degree of ionization, octanol/water partition coefficient (log Po/w), drug-melanin binding energy, and acid/base status of the functional groups within the molecule. Multiple linear regression analysis was used to describe the best model correlating physicochemical and/or biological characteristics of these compounds to their initial distribution at 5 min and to the retention of residual radioactivity in ocular melanin at 96 hr post-injection. The early distribution was a function primarily of acid/base status, pKa, binding energy, and log P(o/w), whereas uveal tract retention in rats was a function of volume of distribution (V1), log P(o/w), pKa, and binding energy. Further, there was a relationship between the initial distribution of a compound into the uveal tract and its retention 96 hr later. More specifically, the structures most likely to be distributed and ultimately retained at high concentrations were those containing strongly basic functionalities, such as piperidine or piperazine moieties and other amines. Further, the more lipophilic and, hence, widely distributed the basic compound, the greater the likelihood that it interacts with ocular melanin. In summary, the use of multiple linear regression analysis was useful in distinguishing which physicochemical characteristics of a compound or group of compounds contributed to melanin binding in pigmented rats in vivo.
Totton, Sarah C; Farrar, Ashley M; Wilkins, Wendy; Bucher, Oliver; Waddell, Lisa A; Wilhelm, Barbara J; McEwen, Scott A; Rajić, Andrijana
2012-10-01
Eating inappropriately prepared poultry meat is a major cause of foodborne salmonellosis. Our objectives were to determine the efficacy of feed and water additives (other than competitive exclusion and antimicrobials) on reducing Salmonella prevalence or concentration in broiler chickens using systematic review-meta-analysis and to explore sources of heterogeneity found in the meta-analysis through meta-regression. Six electronic databases were searched (Current Contents (1999-2009), Agricola (1924-2009), MEDLINE (1860-2009), Scopus (1960-2009), Centre for Agricultural Bioscience (CAB) (1913-2009), and CAB Global Health (1971-2009)), five topic experts were contacted, and the bibliographies of review articles and a topic-relevant textbook were manually searched to identify all relevant research. Study inclusion criteria comprised: English-language primary research investigating the effects of feed and water additives on the Salmonella prevalence or concentration in broiler chickens. Data extraction and study methodological assessment were conducted by two reviewers independently using pretested forms. Seventy challenge studies (n=910 unique treatment-control comparisons), seven controlled studies (n=154), and one quasi-experiment (n=1) met the inclusion criteria. Compared to an assumed control group prevalence of 44 of 1000 broilers, random-effects meta-analysis indicated that the Salmonella cecal colonization in groups with prebiotics (fructooligosaccharide, lactose, whey, dried milk, lactulose, lactosucrose, sucrose, maltose, mannanoligosaccharide) added to feed or water was 15 out of 1000 broilers; with lactose added to feed or water it was 10 out of 1000 broilers; with experimental chlorate product (ECP) added to feed or water it was 21 out of 1000. For ECP the concentration of Salmonella in the ceca was decreased by 0.61 log(10)cfu/g in the treated group compared to the control group. Significant heterogeneity (Cochran's Q-statistic p≤0.10) was observed
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression.
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
ERIC Educational Resources Information Center
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Rank regression: an alternative regression approach for data with outliers.
Chen, Tian; Tang, Wan; Lu, Ying; Tu, Xin
2014-10-01
Linear regression models are widely used in mental health and related health services research. However, the classic linear regression analysis assumes that the data are normally distributed, an assumption that is not met by the data obtained in many studies. One method of dealing with this problem is to use semi-parametric models, which do not require that the data be normally distributed. But semi-parametric models are quite sensitive to outlying observations, so the generated estimates are unreliable when study data includes outliers. In this situation, some researchers trim the extreme values prior to conducting the analysis, but the ad-hoc rules used for data trimming are based on subjective criteria so different methods of adjustment can yield different results. Rank regression provides a more objective approach to dealing with non-normal data that includes outliers. This paper uses simulated and real data to illustrate this useful regression approach for dealing with outliers and compares it to the results generated using classical regression models and semi-parametric regression models.
Foster, Guy M.; Graham, Jennifer L.
2016-04-06
The Kansas River is a primary source of drinking water for about 800,000 people in northeastern Kansas. Source-water supplies are treated by a combination of chemical and physical processes to remove contaminants before distribution. Advanced notification of changing water-quality conditions and cyanobacteria and associated toxin and taste-and-odor compounds provides drinking-water treatment facilities time to develop and implement adequate treatment strategies. The U.S. Geological Survey (USGS), in cooperation with the Kansas Water Office (funded in part through the Kansas State Water Plan Fund), and the City of Lawrence, the City of Topeka, the City of Olathe, and Johnson County Water One, began a study in July 2012 to develop statistical models at two Kansas River sites located upstream from drinking-water intakes. Continuous water-quality monitors have been operated and discrete-water quality samples have been collected on the Kansas River at Wamego (USGS site number 06887500) and De Soto (USGS site number 06892350) since July 2012. Continuous and discrete water-quality data collected during July 2012 through June 2015 were used to develop statistical models for constituents of interest at the Wamego and De Soto sites. Logistic models to continuously estimate the probability of occurrence above selected thresholds were developed for cyanobacteria, microcystin, and geosmin. Linear regression models to continuously estimate constituent concentrations were developed for major ions, dissolved solids, alkalinity, nutrients (nitrogen and phosphorus species), suspended sediment, indicator bacteria (Escherichia coli, fecal coliform, and enterococci), and actinomycetes bacteria. These models will be used to provide real-time estimates of the probability that cyanobacteria and associated compounds exceed thresholds and of the concentrations of other water-quality constituents in the Kansas River. The models documented in this report are useful for characterizing changes
Cactus: An Introduction to Regression
ERIC Educational Resources Information Center
Hyde, Hartley
2008-01-01
When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…
Liang, Shuyan; Hu, Jun; Xie, Yuanyuan; Zhou, Qing; Zhu, Yanhong; Yang, Xiangliang
2016-01-01
Cancer immunotherapy based on nanodelivery systems has shown potential for treatment of various malignancies, owing to the benefits of tumor targeting of nanoparticles. However, induction of a potent T-cell immune response against tumors still remains a challenge. In this study, polyethylenimine-modified carboxyl-styrene/acrylamide (PS) copolymer nano-spheres were developed as a delivery system of unmethylated cytosine-phosphate-guanine (CpG) oligodeoxynucleotides and transforming growth factor-beta (TGF-β) receptor I inhibitors for cancer immunotherapy. TGF-β receptor I inhibitors (LY2157299, LY) were encapsulated to the PS via hydrophobic interaction, while CpG oligodeoxynucleotides were loaded onto the PS through electrostatic interaction. Compared to the control group, tumor inhibition in the PS-LY/CpG group was up to 99.7% without noticeable toxicity. The tumor regression may be attributed to T-cell activation and amplification in mouse models. The results highlight the additive effect of CpG and TGF-β receptor I inhibitors co-delivered in cancer immunotherapy. PMID:28008250
ERIC Educational Resources Information Center
Walton, Joseph M.; And Others
1978-01-01
Ridge regression is an approach to the problem of large standard errors of regression estimates of intercorrelated regressors. The effect of ridge regression on the estimated squared multiple correlation coefficient is discussed and illustrated. (JKS)
Wrong Signs in Regression Coefficients
NASA Technical Reports Server (NTRS)
McGee, Holly
1999-01-01
When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.
Relationship between Multiple Regression and Selected Multivariable Methods.
ERIC Educational Resources Information Center
Schumacker, Randall E.
The relationship of multiple linear regression to various multivariate statistical techniques is discussed. The importance of the standardized partial regression coefficient (beta weight) in multiple linear regression as it is applied in path, factor, LISREL, and discriminant analyses is emphasized. The multivariate methods discussed in this paper…
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Wang, Ming; Flanders, W Dana; Bostick, Roberd M; Long, Qi
2012-12-20
Measurement error is common in epidemiological and biomedical studies. When biomarkers are measured in batches or groups, measurement error is potentially correlated within each batch or group. In regression analysis, most existing methods are not applicable in the presence of batch-specific measurement error in predictors. We propose a robust conditional likelihood approach to account for batch-specific error in predictors when batch effect is additive and the predominant source of error, which requires no assumptions on the distribution of measurement error. Although a regression model with batch as a categorical covariable yields the same parameter estimates as the proposed conditional likelihood approach for linear regression, this result does not hold in general for all generalized linear models, in particular, logistic regression. Our simulation studies show that the conditional likelihood approach achieves better finite sample performance than the regression calibration approach or a naive approach without adjustment for measurement error. In the case of logistic regression, our proposed approach is shown to also outperform the regression approach with batch as a categorical covariate. In addition, we also examine a 'hybrid' approach combining the conditional likelihood method and the regression calibration method, which is shown in simulations to achieve good performance in the presence of both batch-specific and measurement-specific errors. We illustrate our method by using data from a colorectal adenoma study.
Regressive systemic sclerosis.
Black, C; Dieppe, P; Huskisson, T; Hart, F D
1986-01-01
Systemic sclerosis is a disease which usually progresses or reaches a plateau with persistence of symptoms and signs. Regression is extremely unusual. Four cases of established scleroderma are described in which regression is well documented. The significance of this observation and possible mechanisms of disease regression are discussed. Images PMID:3718012
Tharrington, Arnold N.
2015-09-09
The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.
Ehrsam, Eric; Kallini, Joseph R.; Lebas, Damien; Modiano, Philippe; Cotten, Hervé
2016-01-01
Fully regressive melanoma is a phenomenon in which the primary cutaneous melanoma becomes completely replaced by fibrotic components as a result of host immune response. Although 10 to 35 percent of cases of cutaneous melanomas may partially regress, fully regressive melanoma is very rare; only 47 cases have been reported in the literature to date. AH of the cases of fully regressive melanoma reported in the literature were diagnosed in conjunction with metastasis on a patient. The authors describe a case of fully regressive melanoma without any metastases at the time of its diagnosis. Characteristic findings on dermoscopy, as well as the absence of melanoma on final biopsy, confirmed the diagnosis. PMID:27672418
Zhong, Yang; Patel, Sandeep
2010-01-01
Building upon the nonadditive electrostatic force field for alcohols based on the CHARMM charge equilibration (CHEQ) formalism, we introduce atom-pair specific solute-solvent Lennard-Jones (LJ) parameters for alcohol-water interaction force fields targeting improved agreement with experimental hydration free energies of a series of small molecule linear alcohols as well as ab initio water-alcohol geometries and energetics. We consider short-chain, linear alcohols from methanol to butanol as they are canonical small-molecule organic model compounds to represent the hydroxyl chemical functionality for parameterizing biomolecular force fields for proteins. We discuss molecular dynamics simulations of dilute aqueous solutions of methanol and ethanol in TIP4P-FQ water, with particular discussion of solution densities, structure defined in radial distribution functions, electrostatic properties (dipole moment distributions), hydrogen bonding patterns of water, as well as a Kirkwood-Buff (KB) integral analysis. Calculation of the latter provides an assessment of how well classical force fields parameterized to at least semi-quantitatively match experimental hydration free energies capture the microscopic structures of dilute alcohol solutions; the latter translate into macroscopic thermodynamic properties through the application of KB analysis. We find that the CHEQ alcohol force fields of this work semi-quantitatively match experimental KB integrals for methanol and ethanol mole fractions of 0.1 and 0.2. The force field combination qualitatively captures the concentration dependence of the alcohol-alcohol and water-water KB integrals, but due to inadequacies in the representation of the microscopic structures in such systems (which cannot be parameterized in any systematic fashion), a priori quantitative description of alcohol-water KB integrals remains elusive. PMID:20687517
Quantile regression modeling for Malaysian automobile insurance premium data
NASA Astrophysics Data System (ADS)
Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz
2015-09-01
Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Mapping geogenic radon potential by regression kriging.
Pásztor, László; Szabó, Katalin Zsuzsanna; Szatmári, Gábor; Laborczi, Annamária; Horváth, Ákos
2016-02-15
Radon ((222)Rn) gas is produced in the radioactive decay chain of uranium ((238)U) which is an element that is naturally present in soils. Radon is transported mainly by diffusion and convection mechanisms through the soil depending mainly on the physical and meteorological parameters of the soil and can enter and accumulate in buildings. Health risks originating from indoor radon concentration can be attributed to natural factors and is characterized by geogenic radon potential (GRP). Identification of areas with high health risks require spatial modeling, that is, mapping of radon risk. In addition to geology and meteorology, physical soil properties play a significant role in the determination of GRP. In order to compile a reliable GRP map for a model area in Central-Hungary, spatial auxiliary information representing GRP forming environmental factors were taken into account to support the spatial inference of the locally measured GRP values. Since the number of measured sites was limited, efficient spatial prediction methodologies were searched for to construct a reliable map for a larger area. Regression kriging (RK) was applied for the interpolation using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly, the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Overall accuracy of the map was tested by Leave-One-Out Cross-Validation. Furthermore the spatial reliability of the resultant map is also estimated by the calculation of the 90% prediction interval of the local prediction values. The applicability of the applied method as well as that of the map is discussed briefly.
The Geometry of Enhancement in Multiple Regression
ERIC Educational Resources Information Center
Waller, Niels G.
2011-01-01
In linear multiple regression, "enhancement" is said to occur when R[superscript 2] = b[prime]r greater than r[prime]r, where b is a p x 1 vector of standardized regression coefficients and r is a p x 1 vector of correlations between a criterion y and a set of standardized regressors, x. When p = 1 then b [is congruent to] r and…
Canonical Analysis as a Generalized Regression Technique for Multivariate Analysis.
ERIC Educational Resources Information Center
Williams, John D.
The use of characteristic coding (dummy coding) is made in showing solutions to four multivariate problems using canonical analysis. The canonical variates can be themselves analyzed by the use of multiple linear regression. When the canonical variates are used as criteria in a multiple linear regression, the R2 values are equal to 0, where 0 is…
Orthogonal Projection in Teaching Regression and Financial Mathematics
ERIC Educational Resources Information Center
Kachapova, Farida; Kachapov, Ilias
2010-01-01
Two improvements in teaching linear regression are suggested. The first is to include the population regression model at the beginning of the topic. The second is to use a geometric approach: to interpret the regression estimate as an orthogonal projection and the estimation error as the distance (which is minimized by the projection). Linear…
Sundström, Katja; Cedervall, Therese; Ohlsson, Claes; Camacho-Hübner, Cecilia; Sävendahl, Lars
2014-12-01
The growth-promoting effect of combined therapy with GH and IGF-I in normal rats is not known. We therefore investigated the efficacy of treatment with recombinant human (rh)GH and/or rhIGF-I on longitudinal bone growth and bone mass in intact, prepubertal, female Sprague-Dawley rats. rhGH was injected twice daily sc (5 mg/kg·d) and rhIGF-I continuously infused sc (2.2 or 4.4 mg/kg·d) for 28 days. Longitudinal bone growth was monitored by weekly x-rays of tibiae and nose-anus length measurements, and tibial growth plate histomorphology was analyzed. Bone mass was evaluated by peripheral quantitative computed tomography. In addition, serum levels of IGF-I, rat GH, acid labile subunit, IGF binding protein-3, 150-kDa ternary complex formation, and markers of bone formation and degradation were measured. Monotherapy with rhGH was more effective than rhIGF-I (4.4 mg/kg·d) to increase tibia and nose-anus length, whereas combined therapy did not further increase tibia, or nose-anus, lengths or growth plate height. In contrast, combined rhGH and rhIGF-I (4.4 mg/kg·d) therapy had an additive stimulatory effect on cortical bone mass vs rhGH alone. Combined treatment with rhGH and rhIGF-I resulted in markedly higher serum IGF-I concentrations vs rhGH alone but did not compromise the endogenous secretion of GH. We conclude that rhIGF-I treatment augments cortical bone mass but does not further improve bone growth in rhGH-treated young, intact, female rats.
Foumani, Maryam; Vuong, Thu V; MacCormick, Benjamin; Master, Emma R
2015-01-01
The gluco-oligosaccharide oxidase from Sarocladium strictum CBS 346.70 (GOOX) is a single domain flavoenzyme that favourably oxidizes gluco- and xylo- oligosaccharides. In the present study, GOOX was shown to also oxidize plant polysaccharides, including cellulose, glucomannan, β-(1→3,1→4)-glucan, and xyloglucan, albeit to a lesser extent than oligomeric substrates. To improve GOOX activity on polymeric substrates, three carbohydrate binding modules (CBMs) from Clostridium thermocellum, namely CtCBM3 (type A), CtCBM11 (type B), and CtCBM44 (type B), were separately appended to the amino and carboxy termini of the enzyme, generating six fusion proteins. With the exception of GOOX-CtCBM3 and GOOX-CtCBM44, fusion of the selected CBMs increased the catalytic activity of the enzyme (kcat) on cellotetraose by up to 50%. All CBM fusions selectively enhanced GOOX binding to soluble and insoluble polysaccharides, and the immobilized enzyme on a solid cellulose surface remained stable and active. In addition, the CBM fusions increased the activity of GOOX on soluble glucomannan by up to 30% and on insoluble crystalline as well as amorphous cellulose by over 50%.
Foumani, Maryam; Vuong, Thu V.; MacCormick, Benjamin; Master, Emma R.
2015-01-01
The gluco-oligosaccharide oxidase from Sarocladium strictum CBS 346.70 (GOOX) is a single domain flavoenzyme that favourably oxidizes gluco- and xylo- oligosaccharides. In the present study, GOOX was shown to also oxidize plant polysaccharides, including cellulose, glucomannan, β-(1→3,1→4)-glucan, and xyloglucan, albeit to a lesser extent than oligomeric substrates. To improve GOOX activity on polymeric substrates, three carbohydrate binding modules (CBMs) from Clostridium thermocellum, namely CtCBM3 (type A), CtCBM11 (type B), and CtCBM44 (type B), were separately appended to the amino and carboxy termini of the enzyme, generating six fusion proteins. With the exception of GOOX-CtCBM3 and GOOX-CtCBM44, fusion of the selected CBMs increased the catalytic activity of the enzyme (kcat) on cellotetraose by up to 50%. All CBM fusions selectively enhanced GOOX binding to soluble and insoluble polysaccharides, and the immobilized enzyme on a solid cellulose surface remained stable and active. In addition, the CBM fusions increased the activity of GOOX on soluble glucomannan by up to 30 % and on insoluble crystalline as well as amorphous cellulose by over 50 %. PMID:25932926
George: Gaussian Process regression
NASA Astrophysics Data System (ADS)
Foreman-Mackey, Daniel
2015-11-01
George is a fast and flexible library, implemented in C++ with Python bindings, for Gaussian Process regression useful for accounting for correlated noise in astronomical datasets, including those for transiting exoplanet discovery and characterization and stellar population modeling.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Heteroscedastic regression analysis of factors affecting BMD monitoring.
Sadatsafavi, Mohsen; Moayyeri, Alireza; Wang, Liqun; Leslie, William D
2008-11-01
Identifying factors affecting BMD precision and interindividual heterogeneity in BMD change can help optimize BMD monitoring. BMD change for the lumbar spine and total hip for short-term reproducibility (n = 328) and long-term clinical monitoring (n = 2720) populations were analyzed with heteroscedastic regression using linear prediction for mean (monitoring population only) and log-linear prediction for SD (both populations). For clinical monitoring, male sex, baseline body mass index (BMI), and systemic corticosteroid use were associated with greater SD of BMD change. Weight gain was negatively associated with SD for the hip, whereas height change was positively associated with SD for the spine. Each additional year of monitoring increased the SD by 6.5-9.2%. Osteoporosis treatment affected mean change but did not increase dispersion. For short-term reproducibility, performing scans on a different day increased the SD of measurement error by 38-44%. Baseline BMD, difference in bone area, and a repeat scan performed by different technologists were associated with higher measurement error only for the hip. For both samples, heteroscedastic regression outperformed models that assumed homogeneous variance. Heteroscedastic regression techniques are powerful yet underused tools in analyzing longitudinal BMD data and can be used to generate individualized predictions of BMD change and measurement error.
Streamflow forecasting using functional regression
NASA Astrophysics Data System (ADS)
Masselot, Pierre; Dabo-Niang, Sophie; Chebana, Fateh; Ouarda, Taha B. M. J.
2016-07-01
Streamflow, as a natural phenomenon, is continuous in time and so are the meteorological variables which influence its variability. In practice, it can be of interest to forecast the whole flow curve instead of points (daily or hourly). To this end, this paper introduces the functional linear models and adapts it to hydrological forecasting. More precisely, functional linear models are regression models based on curves instead of single values. They allow to consider the whole process instead of a limited number of time points or features. We apply these models to analyse the flow volume and the whole streamflow curve during a given period by using precipitations curves. The functional model is shown to lead to encouraging results. The potential of functional linear models to detect special features that would have been hard to see otherwise is pointed out. The functional model is also compared to the artificial neural network approach and the advantages and disadvantages of both models are discussed. Finally, future research directions involving the functional model in hydrology are presented.
[Understanding logistic regression].
El Sanharawi, M; Naudet, F
2013-10-01
Logistic regression is one of the most common multivariate analysis models utilized in epidemiology. It allows the measurement of the association between the occurrence of an event (qualitative dependent variable) and factors susceptible to influence it (explicative variables). The choice of explicative variables that should be included in the logistic regression model is based on prior knowledge of the disease physiopathology and the statistical association between the variable and the event, as measured by the odds ratio. The main steps for the procedure, the conditions of application, and the essential tools for its interpretation are discussed concisely. We also discuss the importance of the choice of variables that must be included and retained in the regression model in order to avoid the omission of important confounding factors. Finally, by way of illustration, we provide an example from the literature, which should help the reader test his or her knowledge.
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
Ridge Regression: A Regression Procedure for Analyzing correlated Independent Variables
ERIC Educational Resources Information Center
Rakow, Ernest A.
1978-01-01
Ridge regression is a technique used to ameliorate the problem of highly correlated independent variables in multiple regression analysis. This paper explains the fundamentals of ridge regression and illustrates its use. (JKS)
NASA Astrophysics Data System (ADS)
Darnah
2016-04-01
Poisson regression has been used if the response variable is count data that based on the Poisson distribution. The Poisson distribution assumed equal dispersion. In fact, a situation where count data are over dispersion or under dispersion so that Poisson regression inappropriate because it may underestimate the standard errors and overstate the significance of the regression parameters, and consequently, giving misleading inference about the regression parameters. This paper suggests the generalized Poisson regression model to handling over dispersion and under dispersion on the Poisson regression model. The Poisson regression model and generalized Poisson regression model will be applied the number of filariasis cases in East Java. Based regression Poisson model the factors influence of filariasis are the percentage of families who don't behave clean and healthy living and the percentage of families who don't have a healthy house. The Poisson regression model occurs over dispersion so that we using generalized Poisson regression. The best generalized Poisson regression model showing the factor influence of filariasis is percentage of families who don't have healthy house. Interpretation of result the model is each additional 1 percentage of families who don't have healthy house will add 1 people filariasis patient.
Modern Regression Discontinuity Analysis
ERIC Educational Resources Information Center
Bloom, Howard S.
2012-01-01
This article provides a detailed discussion of the theory and practice of modern regression discontinuity (RD) analysis for estimating the effects of interventions or treatments. Part 1 briefly chronicles the history of RD analysis and summarizes its past applications. Part 2 explains how in theory an RD analysis can identify an average effect of…
Explorations in Statistics: Regression
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2011-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This seventh installment of "Explorations in Statistics" explores regression, a technique that estimates the nature of the relationship between two things for which we may only surmise a mechanistic or predictive…
Almost efficient estimation of relative risk regression
Fitzmaurice, Garrett M.; Lipsitz, Stuart R.; Arriaga, Alex; Sinha, Debajyoti; Greenberg, Caprice; Gawande, Atul A.
2014-01-01
Relative risks (RRs) are often considered the preferred measures of association in prospective studies, especially when the binary outcome of interest is common. In particular, many researchers regard RRs to be more intuitively interpretable than odds ratios. Although RR regression is a special case of generalized linear models, specifically with a log link function for the binomial (or Bernoulli) outcome, the resulting log-binomial regression does not respect the natural parameter constraints. Because log-binomial regression does not ensure that predicted probabilities are mapped to the [0,1] range, maximum likelihood (ML) estimation is often subject to numerical instability that leads to convergence problems. To circumvent these problems, a number of alternative approaches for estimating RR regression parameters have been proposed. One approach that has been widely studied is the use of Poisson regression estimating equations. The estimating equations for Poisson regression yield consistent, albeit inefficient, estimators of the RR regression parameters. We consider the relative efficiency of the Poisson regression estimator and develop an alternative, almost efficient estimator for the RR regression parameters. The proposed method uses near-optimal weights based on a Maclaurin series (Taylor series expanded around zero) approximation to the true Bernoulli or binomial weight function. This yields an almost efficient estimator while avoiding convergence problems. We examine the asymptotic relative efficiency of the proposed estimator for an increase in the number of terms in the series. Using simulations, we demonstrate the potential for convergence problems with standard ML estimation of the log-binomial regression model and illustrate how this is overcome using the proposed estimator. We apply the proposed estimator to a study of predictors of pre-operative use of beta blockers among patients undergoing colorectal surgery after diagnosis of colon cancer. PMID
Investigating bias in squared regression structure coefficients
Nimon, Kim F.; Zientek, Linda R.; Thompson, Bruce
2015-01-01
The importance of structure coefficients and analogs of regression weights for analysis within the general linear model (GLM) has been well-documented. The purpose of this study was to investigate bias in squared structure coefficients in the context of multiple regression and to determine if a formula that had been shown to correct for bias in squared Pearson correlation coefficients and coefficients of determination could be used to correct for bias in squared regression structure coefficients. Using data from a Monte Carlo simulation, this study found that squared regression structure coefficients corrected with Pratt's formula produced less biased estimates and might be more accurate and stable estimates of population squared regression structure coefficients than estimates with no such corrections. While our findings are in line with prior literature that identified multicollinearity as a predictor of bias in squared regression structure coefficients but not coefficients of determination, the findings from this study are unique in that the level of predictive power, number of predictors, and sample size were also observed to contribute bias in squared regression structure coefficients. PMID:26217273
Investigating bias in squared regression structure coefficients.
Nimon, Kim F; Zientek, Linda R; Thompson, Bruce
2015-01-01
The importance of structure coefficients and analogs of regression weights for analysis within the general linear model (GLM) has been well-documented. The purpose of this study was to investigate bias in squared structure coefficients in the context of multiple regression and to determine if a formula that had been shown to correct for bias in squared Pearson correlation coefficients and coefficients of determination could be used to correct for bias in squared regression structure coefficients. Using data from a Monte Carlo simulation, this study found that squared regression structure coefficients corrected with Pratt's formula produced less biased estimates and might be more accurate and stable estimates of population squared regression structure coefficients than estimates with no such corrections. While our findings are in line with prior literature that identified multicollinearity as a predictor of bias in squared regression structure coefficients but not coefficients of determination, the findings from this study are unique in that the level of predictive power, number of predictors, and sample size were also observed to contribute bias in squared regression structure coefficients.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Astronomical Methods for Nonparametric Regression
NASA Astrophysics Data System (ADS)
Steinhardt, Charles L.; Jermyn, Adam
2017-01-01
I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.
Red and photographic infrared linear combinations for monitoring vegetation
NASA Technical Reports Server (NTRS)
Tucker, C. J.
1978-01-01
In situ collected spectrometer data were used to evaluate and quantify the relationships between various linear combinations of red and photographic infrared radiances and experimental plot biomass, leaf water content, and chlorophyll content. The radiance variables evaluated included the red and photographic infrared (IR) radiance and the linear combinations of the IR/red ratio, the square root of th IR/red difference, the vegetation index, and the transformed vegetation index. In addition, the corresponding green and red linear combinations were evaluated for comparative purposes. Three data sets were used from June, September, and October sampling periods. Regression analysis showed the increase utility of the IR and red linear combinations vis-a-vis the same green and red linear combinations. The red and IR linear combinations had 7% and 14% greater regression significance than the green and red linear combinations for the June and September sampling periods, respectively. The VI, TVI, and square root of the IR/red ration were the most significant followed closely by the IR/red ratio. Less than 6% difference separated the highest and lowest of these four IR and red linear combinations. The use of these linear combinations was shown to be sensitive primarily to the green leaf area or green leaf biomass.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Regression-kriging for characterizing soils with remotesensing data
NASA Astrophysics Data System (ADS)
Ge, Yufeng; Thomasson, J. Alex; Sui, Ruixiu; Wooten, James
2011-09-01
In precision agriculture regression has been used widely to quantify the relationship between soil attributes and other environmental variables. However, spatial correlation existing in soil samples usually violates a basic assumption of regression: sample independence. In this study, a regression-kriging method was attempted in relating soil properties to the remote sensing image of a cotton field near Vance, Mississippi, USA. The regression-kriging model was developed and tested by using 273 soil samples collected from the field. The result showed that by properly incorporating the spatial correlation information of regression residuals, the regression-kriging model generally achieved higher prediction accuracy than the stepwise multiple linear regression model. Most strikingly, a 50% increase in prediction accuracy was shown in soil sodium concentration. Potential usages of regression-kriging in future precision agriculture applications include real-time soil sensor development and digital soil mapping.
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Adaptive Local Linear Regression with Application to Printer Color Management
2008-01-01
values formed the test samples. This process guaranteed that the CIELAB test samples were in the gamut for each printer, but each printer had a...digital images has recently led to increased consumer demand for accurate color reproduction. Given a CIELAB color one would like to reproduce, the color...management problem is to determine what RGB color one must send the printer to minimize the error between the desired CIELAB color and the CIELAB
Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach
ERIC Educational Resources Information Center
Mesic, Vanes; Muratovic, Hasnija
2011-01-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…
Using multiple linear regression model to estimate thunderstorm activity
NASA Astrophysics Data System (ADS)
Suparta, W.; Putro, W. S.
2017-03-01
This paper is aimed to develop a numerical model with the use of a nonlinear model to estimate the thunderstorm activity. Meteorological data such as Pressure (P), Temperature (T), Relative Humidity (H), cloud (C), Precipitable Water Vapor (PWV), and precipitation on a daily basis were used in the proposed method. The model was constructed with six configurations of input and one target output. The output tested in this work is the thunderstorm event when one-year data is used. Results showed that the model works well in estimating thunderstorm activities with the maximum epoch reaching 1000 iterations and the percent error was found below 50%. The model also found that the thunderstorm activities in May and October are detected higher than the other months due to the inter-monsoon season.
Parameter Estimation of a Tactical Missile using Linear Regression
2006-08-01
aerodynamic data for the 6- DoF missile model was based on a supersonic , tail controlled missile similar to an AIM-9X missile . Two command input types were...3.38) 3.3.7 Aerodynamic Data Generation The aerodynamic data for the 6-DoF missile model was based on a supersonic , tail controlled ...basic airframe aerodynamics consists of data on the missile configuration without the controls
Linear models: permutation methods
Cade, B.S.; Everitt, B.S.; Howell, D.C.
2005-01-01
Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...
Calculating a Stepwise Ridge Regression.
ERIC Educational Resources Information Center
Morris, John D.
1986-01-01
Although methods for using ordinary least squares regression computer programs to calculate a ridge regression are available, the calculation of a stepwise ridge regression requires a special purpose algorithm and computer program. The correct stepwise ridge regression procedure is given, and a parallel FORTRAN computer program is described.…
Marginal longitudinal semiparametric regression via penalized splines
Kadiri, M. Al; Carroll, R.J.; Wand, M.P.
2010-01-01
We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models. PMID:21037941
Estimation of propensity scores using generalized additive models.
Woo, Mi-Ja; Reiter, Jerome P; Karr, Alan F
2008-08-30
Propensity score matching is often used in observational studies to create treatment and control groups with similar distributions of observed covariates. Typically, propensity scores are estimated using logistic regressions that assume linearity between the logistic link and the predictors. We evaluate the use of generalized additive models (GAMs) for estimating propensity scores. We compare logistic regressions and GAMs in terms of balancing covariates using simulation studies with artificial and genuine data. We find that, when the distributions of covariates in the treatment and control groups overlap sufficiently, using GAMs can improve overall covariate balance, especially for higher-order moments of distributions. When the distributions in the two groups overlap insufficiently, GAM more clearly reveals this fact than logistic regression does. We also demonstrate via simulation that matching with GAMs can result in larger reductions in bias when estimating treatment effects than matching with logistic regression.
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
The "Smarter Regression" Add-In for Linear and Logistic Regression in Excel
2007-07-01
only Visual Basic , the built-in programming language of Excel (Walkenbach, 1999). We wanted to avoid the use of external Dynamic Linked Libraries...least two different ways of entering results into workbook cells in Visual Basic . One is to establish an array in Visual Basic , fill up the elements of
Batch Mode Active Learning for Regression With Expected Model Change.
Cai, Wenbin; Zhang, Muhan; Zhang, Ya
2016-04-20
While active learning (AL) has been widely studied for classification problems, limited efforts have been done on AL for regression. In this paper, we introduce a new AL framework for regression, expected model change maximization (EMCM), which aims at choosing the unlabeled data instances that result in the maximum change of the current model once labeled. The model change is quantified as the difference between the current model parameters and the updated parameters after the inclusion of the newly selected examples. In light of the stochastic gradient descent learning rule, we approximate the change as the gradient of the loss function with respect to each single candidate instance. Under the EMCM framework, we propose novel AL algorithms for the linear and nonlinear regression models. In addition, by simulating the behavior of the sequential AL policy when applied for k iterations, we further extend the algorithms to batch mode AL to simultaneously choose a set of k most informative instances at each query time. Extensive experimental results on both UCI and StatLib benchmark data sets have demonstrated that the proposed algorithms are highly effective and efficient.
Kramer, S.
1996-12-31
In many real-world domains the task of machine learning algorithms is to learn a theory for predicting numerical values. In particular several standard test domains used in Inductive Logic Programming (ILP) are concerned with predicting numerical values from examples and relational and mostly non-determinate background knowledge. However, so far no ILP algorithm except one can predict numbers and cope with nondeterminate background knowledge. (The only exception is a covering algorithm called FORS.) In this paper we present Structural Regression Trees (SRT), a new algorithm which can be applied to the above class of problems. SRT integrates the statistical method of regression trees into ILP. It constructs a tree containing a literal (an atomic formula or its negation) or a conjunction of literals in each node, and assigns a numerical value to each leaf. SRT provides more comprehensible results than purely statistical methods, and can be applied to a class of problems most other ILP systems cannot handle. Experiments in several real-world domains demonstrate that the approach is competitive with existing methods, indicating that the advantages are not at the expense of predictive accuracy.
Regression Commonality Analysis: A Technique for Quantitative Theory Building
ERIC Educational Resources Information Center
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
Quantile Regression in the Study of Developmental Sciences
ERIC Educational Resources Information Center
Petscher, Yaacov; Logan, Jessica A. R.
2014-01-01
Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Assessing risk factors for periodontitis using regression
NASA Astrophysics Data System (ADS)
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
2013-10-01
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
NASA Technical Reports Server (NTRS)
Kuhl, Mark R.
1990-01-01
Current navigation requirements depend on a geometric dilution of precision (GDOP) criterion. As long as the GDOP stays below a specific value, navigation requirements are met. The GDOP will exceed the specified value when the measurement geometry becomes too collinear. A new signal processing technique, called Ridge Regression Processing, can reduce the effects of nearly collinear measurement geometry; thereby reducing the inflation of the measurement errors. It is shown that the Ridge signal processor gives a consistently better mean squared error (MSE) in position than the Ordinary Least Mean Squares (OLS) estimator. The applicability of this technique is currently being investigated to improve the following areas: receiver autonomous integrity monitoring (RAIM), coverage requirements, availability requirements, and precision approaches.
NASA Astrophysics Data System (ADS)
Sidorin, Anatoly
2010-01-01
In linear accelerators the particles are accelerated by either electrostatic fields or oscillating Radio Frequency (RF) fields. Accordingly the linear accelerators are divided in three large groups: electrostatic, induction and RF accelerators. Overview of the different types of accelerators is given. Stability of longitudinal and transverse motion in the RF linear accelerators is briefly discussed. The methods of beam focusing in linacs are described.
Spatial vulnerability assessments by regression kriging
NASA Astrophysics Data System (ADS)
Pásztor, László; Laborczi, Annamária; Takács, Katalin; Szatmári, Gábor
2016-04-01
information representing IEW or GRP forming environmental factors were taken into account to support the spatial inference of the locally experienced IEW frequency and measured GRP values respectively. An efficient spatial prediction methodology was applied to construct reliable maps, namely regression kriging (RK) using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Application of RK also provides the possibility of inherent accuracy assessment. The resulting maps are characterized by global and local measures of its accuracy. Additionally the method enables interval estimation for spatial extension of the areas of predefined risk categories. All of these outputs provide useful contribution to spatial planning, action planning and decision making. Acknowledgement: Our work was partly supported by the Hungarian National Scientific Research Foundation (OTKA, Grant No. K105167).
Variable Selection in Semiparametric Regression Modeling.
Li, Runze; Liang, Hua
2008-01-01
In this paper, we are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and select significant variables for parametric portion. Thus, it is much more challenging than that for parametric models such as linear models and generalized linear models because traditional variable selection procedures including stepwise regression and the best subset selection require model selection to nonparametric components for each submodel. This leads to very heavy computational burden. In this paper, we propose a class of variable selection procedures for semiparametric regression models using nonconcave penalized likelihood. The newly proposed procedures are distinguished from the traditional ones in that they delete insignificant variables and estimate the coefficients of significant variables simultaneously. This allows us to establish the sampling properties of the resulting estimate. We first establish the rate of convergence of the resulting estimate. With proper choices of penalty functions and regularization parameters, we then establish the asymptotic normality of the resulting estimate, and further demonstrate that the proposed procedures perform as well as an oracle procedure. Semiparametric generalized likelihood ratio test is proposed to select significant variables in the nonparametric component. We investigate the asymptotic behavior of the proposed test and demonstrate its limiting null distribution follows a chi-squared distribution, which is independent of the nuisance parameters. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.
Orbe, Jesus; Ferreira, Eva; Núñez-Antón, Vicente
2003-01-01
In this work we study the effect of several covariates on a censored response variable with unknown probability distribution. A semiparametric model is proposed to consider situations where the functional form of the effect of one or more covariates is unknown, as is the case in the application presented in this work. We provide its estimation procedure and, in addition, a bootstrap technique to make inference on the parameters. A simulation study has been carried out to show the good performance of the proposed estimation process and to analyse the effect of the censorship. Finally, we present the results when the methodology is applied to AIDS diagnosed patients.
Prediction of dynamical systems by symbolic regression
NASA Astrophysics Data System (ADS)
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Prediction of dynamical systems by symbolic regression.
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K; Noack, Bernd R
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Regression based modeling of vegetation and climate variables for the Amazon rainforests
NASA Astrophysics Data System (ADS)
Kodali, A.; Khandelwal, A.; Ganguly, S.; Bongard, J.; Das, K.
2015-12-01
appropriate function and terminal set choices for the symbolic regression based modeling of the effects of climate on Amazon vegetation. Additionally, we compare the predictive capability of the symbolic regression based model to baseline techniques such as linear regularized regression and support vector regression.
Error bounds in cascading regressions
Karlinger, M.R.; Troutman, B.M.
1985-01-01
Cascading regressions is a technique for predicting a value of a dependent variable when no paired measurements exist to perform a standard regression analysis. Biases in coefficients of a cascaded-regression line as well as error variance of points about the line are functions of the correlation coefficient between dependent and independent variables. Although this correlation cannot be computed because of the lack of paired data, bounds can be placed on errors through the required properties of the correlation coefficient. The potential meansquared error of a cascaded-regression prediction can be large, as illustrated through an example using geomorphologic data. ?? 1985 Plenum Publishing Corporation.
40 CFR 1066.220 - Linearity verification.
Code of Federal Regulations, 2012 CFR
2012-07-01
...-squares linear regression and the linearity criteria specified in Table 1 of this section. (b) Performance requirements. If a measurement system does not meet the applicable linearity criteria in Table 1 of this... system at the specified temperatures and pressures. This may include any specified adjustment or...
Incremental learning for ν-Support Vector Regression.
Gu, Bin; Sheng, Victor S; Wang, Zhijie; Ho, Derek; Osman, Said; Li, Shuo
2015-07-01
The ν-Support Vector Regression (ν-SVR) is an effective regression learning algorithm, which has the advantage of using a parameter ν on controlling the number of support vectors and adjusting the width of the tube automatically. However, compared to ν-Support Vector Classification (ν-SVC) (Schölkopf et al., 2000), ν-SVR introduces an additional linear term into its objective function. Thus, directly applying the accurate on-line ν-SVC algorithm (AONSVM) to ν-SVR will not generate an effective initial solution. It is the main challenge to design an incremental ν-SVR learning algorithm. To overcome this challenge, we propose a special procedure called initial adjustments in this paper. This procedure adjusts the weights of ν-SVC based on the Karush-Kuhn-Tucker (KKT) conditions to prepare an initial solution for the incremental learning. Combining the initial adjustments with the two steps of AONSVM produces an exact and effective incremental ν-SVR learning algorithm (INSVR). Theoretical analysis has proven the existence of the three key inverse matrices, which are the cornerstones of the three steps of INSVR (including the initial adjustments), respectively. The experiments on benchmark datasets demonstrate that INSVR can avoid the infeasible updating paths as far as possible, and successfully converges to the optimal solution. The results also show that INSVR is faster than batch ν-SVR algorithms with both cold and warm starts.
Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones
NASA Astrophysics Data System (ADS)
Rosli, A. D.; Hashim, H.; Khairuzzaman, N. A.; Mohd Sampian, A. F.; Baharudin, R.; Abdullah, N. E.; Sulaiman, M. S.; Kamaru'zzaman, M.
2015-11-01
This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water.
On regression adjustment for the propensity score.
Vansteelandt, S; Daniel, R M
2014-10-15
Propensity scores are widely adopted in observational research because they enable adjustment for high-dimensional confounders without requiring models for their association with the outcome of interest. The results of statistical analyses based on stratification, matching or inverse weighting by the propensity score are therefore less susceptible to model extrapolation than those based solely on outcome regression models. This is attractive because extrapolation in outcome regression models may be alarming, yet difficult to diagnose, when the exposed and unexposed individuals have very different covariate distributions. Standard regression adjustment for the propensity score forms an alternative to the aforementioned propensity score methods, but the benefits of this are less clear because it still involves modelling the outcome in addition to the propensity score. In this article, we develop novel insights into the properties of this adjustment method. We demonstrate that standard tests of the null hypothesis of no exposure effect (based on robust variance estimators), as well as particular standardised effects obtained from such adjusted regression models, are robust against misspecification of the outcome model when a propensity score model is correctly specified; they are thus not vulnerable to the aforementioned problem of extrapolation. We moreover propose efficient estimators for these standardised effects, which retain a useful causal interpretation even when the propensity score model is misspecified, provided the outcome regression model is correctly specified.
Poisson Regression Analysis of Illness and Injury Surveillance Data
Frome E.L., Watkins J.P., Ellis E.D.
2012-12-12
The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Additive interaction between heterogeneous environmental ...
BACKGROUND Environmental exposures often occur in tandem; however, epidemiological research often focuses on singular exposures. Statistical interactions among broad, well-characterized environmental domains have not yet been evaluated in association with health. We address this gap by conducting a county-level cross-sectional analysis of interactions between Environmental Quality Index (EQI) domain indices on preterm birth in the Unites States from 2000-2005.METHODS: The EQI, a county-level index constructed for the 2000-2005 time period, was constructed from five domain-specific indices (air, water, land, built and sociodemographic) using principal component analyses. County-level preterm birth rates (n=3141) were estimated using live births from the National Center for Health Statistics. Linear regression was used to estimate prevalence differences (PD) and 95% confidence intervals (CI) comparing worse environmental quality to the better quality for each model for a) each individual domain main effect b) the interaction contrast and c) the two main effects plus interaction effect (i.e. the “net effect”) to show departure from additive interaction for the all U.S counties. Analyses were also performed for subgroupings by four urban/rural strata. RESULTS: We found the suggestion of antagonistic interactions but no synergism, along with several purely additive (i.e., no interaction) associations. In the non-stratified model, we observed antagonistic interac
Christofilos, N.C.; Polk, I.J.
1959-02-17
Improvements in linear particle accelerators are described. A drift tube system for a linear ion accelerator reduces gap capacity between adjacent drift tube ends. This is accomplished by reducing the ratio of the diameter of the drift tube to the diameter of the resonant cavity. Concentration of magnetic field intensity at the longitudinal midpoint of the external sunface of each drift tube is reduced by increasing the external drift tube diameter at the longitudinal center region.
ERIC Educational Resources Information Center
Kaplan, David
2005-01-01
This article considers the problem of estimating dynamic linear regression models when the data are generated from finite mixture probability density function where the mixture components are characterized by different dynamic regression model parameters. Specifically, conventional linear models assume that the data are generated by a single…
A flexible count data regression model for risk analysis.
Guikema, Seth D; Coffelt, Jeremy P; Goffelt, Jeremy P
2008-02-01
In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets.
ERIC Educational Resources Information Center
Story, Roger E.
1996-01-01
Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Robust regression with asymmetric heavy-tail noise distributions.
Takeuchi, Ichiro; Bengio, Yoshua; Kanamori, Takafumi
2002-10-01
In the presence of a heavy-tail noise distribution, regression becomes much more difficult. Traditional robust regression methods assume that the noise distribution is symmetric, and they downweight the influence of so-called outliers. When the noise distribution is asymmetric, these methods yield biased regression estimators. Motivated by data-mining problems for the insurance industry, we propose a new approach to robust regression tailored to deal with asymmetric noise distribution. The main idea is to learn most of the parameters of the model using conditional quantile estimators (which are biased but robust estimators of the regression) and to learn a few remaining parameters to combine and correct these estimators, to minimize the average squared error in an unbiased way. Theoretical analysis and experiments show the clear advantages of the approach. Results are on artificial data as well as insurance data, using both linear and neural network predictors.
Quantiles Regression Approach to Identifying the Determinant of Breastfeeding Duration
NASA Astrophysics Data System (ADS)
Mahdiyah; Norsiah Mohamed, Wan; Ibrahim, Kamarulzaman
In this study, quantiles regression approach is applied to the data of Malaysian Family Life Survey (MFLS), to identify factors which are significantly related to the different conditional quantiles of the breastfeeding duration. It is known that the classical linear regression methods are based on minimizing residual sum of squared, but quantiles regression use a mechanism which are based on the conditional median function and the full range of other conditional quantile functions. Overall, it is found that the period of breastfeeding is significantly related to place of living, religion and total number of children in the family.
Using ridge regression in systematic pointing error corrections
NASA Technical Reports Server (NTRS)
Guiar, C. N.
1988-01-01
A pointing error model is used in the antenna calibration process. Data from spacecraft or radio star observations are used to determine the parameters in the model. However, the regression variables are not truly independent, displaying a condition known as multicollinearity. Ridge regression, a biased estimation technique, is used to combat the multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking (days 105 and 106 of 1987) were analyzed using both linear least squares and ridge regression methods. The advantages and limitations of employing the technique are presented. The problem is not yet fully resolved.
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate
Modeling confounding by half-sibling regression
Schölkopf, Bernhard; Hogg, David W.; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-01-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as “half-sibling regression,” is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154
Modeling confounding by half-sibling regression.
Schölkopf, Bernhard; Hogg, David W; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-07-05
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application.
On the linear programming bound for linear Lee codes.
Astola, Helena; Tabus, Ioan
2016-01-01
Based on an invariance-type property of the Lee-compositions of a linear Lee code, additional equality constraints can be introduced to the linear programming problem of linear Lee codes. In this paper, we formulate this property in terms of an action of the multiplicative group of the field [Formula: see text] on the set of Lee-compositions. We show some useful properties of certain sums of Lee-numbers, which are the eigenvalues of the Lee association scheme, appearing in the linear programming problem of linear Lee codes. Using the additional equality constraints, we formulate the linear programming problem of linear Lee codes in a very compact form, leading to a fast execution, which allows to efficiently compute the bounds for large parameter values of the linear codes.
HT-FRTC: a fast radiative transfer code using kernel regression
NASA Astrophysics Data System (ADS)
Thelen, Jean-Claude; Havemann, Stephan; Lewis, Warren
2016-09-01
The HT-FRTC is a principal component based fast radiative transfer code that can be used across the electromagnetic spectrum from the microwave through to the ultraviolet to calculate transmittance, radiance and flux spectra. The principal components cover the spectrum at a very high spectral resolution, which allows very fast line-by-line, hyperspectral and broadband simulations for satellite-based, airborne and ground-based sensors. The principal components are derived during a code training phase from line-by-line simulations for a diverse set of atmosphere and surface conditions. The derived principal components are sensor independent, i.e. no extra training is required to include additional sensors. During the training phase we also derive the predictors which are required by the fast radiative transfer code to determine the principal component scores from the monochromatic radiances (or fluxes, transmittances). These predictors are calculated for each training profile at a small number of frequencies, which are selected by a k-means cluster algorithm during the training phase. Until recently the predictors were calculated using a linear regression. However, during a recent rewrite of the code the linear regression was replaced by a Gaussian Process (GP) regression which resulted in a significant increase in accuracy when compared to the linear regression. The HT-FRTC has been trained with a large variety of gases, surface properties and scatterers. Rayleigh scattering as well as scattering by frozen/liquid clouds, hydrometeors and aerosols have all been included. The scattering phase function can be fully accounted for by an integrated line-by-line version of the Edwards-Slingo spherical harmonics radiation code or approximately by a modification to the extinction (Chou scaling).
Spencer, Michael
1974-01-01
Food additives are discussed from the food technology point of view. The reasons for their use are summarized: (1) to protect food from chemical and microbiological attack; (2) to even out seasonal supplies; (3) to improve their eating quality; (4) to improve their nutritional value. The various types of food additives are considered, e.g. colours, flavours, emulsifiers, bread and flour additives, preservatives, and nutritional additives. The paper concludes with consideration of those circumstances in which the use of additives is (a) justified and (b) unjustified. PMID:4467857
Multiple Regression and Its Discontents
ERIC Educational Resources Information Center
Snell, Joel C.; Marsh, Mitchell
2012-01-01
Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.
The Use of Linear Programming for Prediction.
ERIC Educational Resources Information Center
Schnittjer, Carl J.
The purpose of the study was to develop a linear programming model to be used for prediction, test the accuracy of the predictions, and compare the accuracy with that produced by curvilinear multiple regression analysis. (Author)
Reasons for Hierarchical Linear Modeling: A Reminder.
ERIC Educational Resources Information Center
Wang, Jianjun
1999-01-01
Uses examples of hierarchical linear modeling (HLM) at local and national levels to illustrate proper applications of HLM and dummy variable regression. Raises cautions about the circumstances under which hierarchical data do not need HLM. (SLD)
Ahearn, Elizabeth A.
2010-01-01
Multiple linear regression equations for determining flow-duration statistics were developed to estimate select flow exceedances ranging from 25- to 99-percent for six 'bioperiods'-Salmonid Spawning (November), Overwinter (December-February), Habitat Forming (March-April), Clupeid Spawning (May), Resident Spawning (June), and Rearing and Growth (July-October)-in Connecticut. Regression equations also were developed to estimate the 25- and 99-percent flow exceedances without reference to a bioperiod. In total, 32 equations were developed. The predictive equations were based on regression analyses relating flow statistics from streamgages to GIS-determined basin and climatic characteristics for the drainage areas of those streamgages. Thirty-nine streamgages (and an additional 6 short-term streamgages and 28 partial-record sites for the non-bioperiod 99-percent exceedance) in Connecticut and adjacent areas of neighboring States were used in the regression analysis. Weighted least squares regression analysis was used to determine the predictive equations; weights were assigned based on record length. The basin characteristics-drainage area, percentage of area with coarse-grained stratified deposits, percentage of area with wetlands, mean monthly precipitation (November), mean seasonal precipitation (December, January, and February), and mean basin elevation-are used as explanatory variables in the equations. Standard errors of estimate of the 32 equations ranged from 10.7 to 156 percent with medians of 19.2 and 55.4 percent to predict the 25- and 99-percent exceedances, respectively. Regression equations to estimate high and median flows (25- to 75-percent exceedances) are better predictors (smaller variability of the residual values around the regression line) than the equations to estimate low flows (less than 75-percent exceedance). The Habitat Forming (March-April) bioperiod had the smallest standard errors of estimate, ranging from 10.7 to 20.9 percent. In
Ryu, Duchwan; Li, Erning; Mallick, Bani K
2011-06-01
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves.
Model building in nonproportional hazard regression.
Rodríguez-Girondo, Mar; Kneib, Thomas; Cadarso-Suárez, Carmen; Abu-Assi, Emad
2013-12-30
Recent developments of statistical methods allow for a very flexible modeling of covariates affecting survival times via the hazard rate, including also the inspection of possible time-dependent associations. Despite their immediate appeal in terms of flexibility, these models typically introduce additional difficulties when a subset of covariates and the corresponding modeling alternatives have to be chosen, that is, for building the most suitable model for given data. This is particularly true when potentially time-varying associations are given. We propose to conduct a piecewise exponential representation of the original survival data to link hazard regression with estimation schemes based on of the Poisson likelihood to make recent advances for model building in exponential family regression accessible also in the nonproportional hazard regression context. A two-stage stepwise selection approach, an approach based on doubly penalized likelihood, and a componentwise functional gradient descent approach are adapted to the piecewise exponential regression problem. These three techniques were compared via an intensive simulation study. An application to prognosis after discharge for patients who suffered a myocardial infarction supplements the simulation to demonstrate the pros and cons of the approaches in real data analyses.
Functional Generalized Additive Models.
McLean, Mathew W; Hooker, Giles; Staicu, Ana-Maria; Scheipl, Fabian; Ruppert, David
2014-01-01
We introduce the functional generalized additive model (FGAM), a novel regression model for association studies between a scalar response and a functional predictor. We model the link-transformed mean response as the integral with respect to t of F{X(t), t} where F(·,·) is an unknown regression function and X(t) is a functional covariate. Rather than having an additive model in a finite number of principal components as in Müller and Yao (2008), our model incorporates the functional predictor directly and thus our model can be viewed as the natural functional extension of generalized additive models. We estimate F(·,·) using tensor-product B-splines with roughness penalties. A pointwise quantile transformation of the functional predictor is also considered to ensure each tensor-product B-spline has observed data on its support. The methods are evaluated using simulated data and their predictive performance is compared with other competing scalar-on-function regression alternatives. We illustrate the usefulness of our approach through an application to brain tractography, where X(t) is a signal from diffusion tensor imaging at position, t, along a tract in the brain. In one example, the response is disease-status (case or control) and in a second example, it is the score on a cognitive test. R code for performing the simulations and fitting the FGAM can be found in supplemental materials available online.
Estimating peak flow characteristics at ungaged sites by ridge regression
Tasker, Gary D.
1982-01-01
A regression simulation model, is combined with a multisite streamflow generator to simulate a regional regression of 50-year peak discharge against a set of basin characteristics. Monte Carlo experiments are used to compare the unbiased ordinary lease squares parameter estimator with Hoerl and Kennard's (1970a) ridge estimator in which the biasing parameter is that proposed by Hoerl, Kennard, and Baldwin (1975). The simulation results indicate a substantial improvement in parameter estimation using ridge regression when the correlation between basin characteristics is more than about 0.90. In addition, results indicate a strong potential for improving the mean square error of prediction of a peak-flow characteristic versus basin characteristics regression model when the basin characteristics are approximately colinear. The simulation covers a range of regression parameters, streamflow statistics, and basin characteristics commonly found in regional regression studies.
Tosteson, Tor D; Buzas, Jeffrey S; Demidenko, Eugene; Karagas, Margaret
2003-04-15
Covariate measurement error is often a feature of scientific data used for regression modelling. The consequences of such errors include a loss of power of tests of significance for the regression parameters corresponding to the true covariates. Power and sample size calculations that ignore covariate measurement error tend to overestimate power and underestimate the actual sample size required to achieve a desired power. In this paper we derive a novel measurement error corrected power function for generalized linear models using a generalized score test based on quasi-likelihood methods. Our power function is flexible in that it is adaptable to designs with a discrete or continuous scalar covariate (exposure) that can be measured with or without error, allows for additional confounding variables and applies to a broad class of generalized regression and measurement error models. A program is described that provides sample size or power for a continuous exposure with a normal measurement error model and a single normal confounder variable in logistic regression. We demonstrate the improved properties of our power calculations with simulations and numerical studies. An example is given from an ongoing study of cancer and exposure to arsenic as measured by toenail concentrations and tap water samples.
XRA image segmentation using regression
NASA Astrophysics Data System (ADS)
Jin, Jesse S.
1996-04-01
Segmentation is an important step in image analysis. Thresholding is one of the most important approaches. There are several difficulties in segmentation, such as automatic selecting threshold, dealing with intensity distortion and noise removal. We have developed an adaptive segmentation scheme by applying the Central Limit Theorem in regression. A Gaussian regression is used to separate the distribution of background from foreground in a single peak histogram. The separation will help to automatically determine the threshold. A small 3 by 3 widow is applied and the modal of the local histogram is used to overcome noise. Thresholding is based on local weighting, where regression is used again for parameter estimation. A connectivity test is applied to the final results to remove impulse noise. We have applied the algorithm to x-ray angiogram images to extract brain arteries. The algorithm works well for single peak distribution where there is no valley in the histogram. The regression provides a method to apply knowledge in clustering. Extending regression for multiple-level segmentation needs further investigation.
Interactive natural image segmentation via spline regression.
Xiang, Shiming; Nie, Feiping; Zhang, Chunxia; Zhang, Changshui
2009-07-01
This paper presents an interactive algorithm for segmentation of natural images. The task is formulated as a problem of spline regression, in which the spline is derived in Sobolev space and has a form of a combination of linear and Green's functions. Besides its nonlinear representation capability, one advantage of this spline in usage is that, once it has been constructed, no parameters need to be tuned to data. We define this spline on the user specified foreground and background pixels, and solve its parameters (the combination coefficients of functions) from a group of linear equations. To speed up spline construction, K-means clustering algorithm is employed to cluster the user specified pixels. By taking the cluster centers as representatives, this spline can be easily constructed. The foreground object is finally cut out from its background via spline interpolation. The computational complexity of the proposed algorithm is linear in the number of the pixels to be segmented. Experiments on diverse natural images, with comparison to existing algorithms, illustrate the validity of our method.
Demonstration of a Fiber Optic Regression Probe
NASA Technical Reports Server (NTRS)
Korman, Valentin; Polzin, Kurt A.
2010-01-01
The capability to provide localized, real-time monitoring of material regression rates in various applications has the potential to provide a new stream of data for development testing of various components and systems, as well as serving as a monitoring tool in flight applications. These applications include, but are not limited to, the regression of a combusting solid fuel surface, the ablation of the throat in a chemical rocket or the heat shield of an aeroshell, and the monitoring of erosion in long-life plasma thrusters. The rate of regression in the first application is very fast, while the second and third are increasingly slower. A recent fundamental sensor development effort has led to a novel regression, erosion, and ablation sensor technology (REAST). The REAST sensor allows for measurement of real-time surface erosion rates at a discrete surface location. The sensor is optical, using two different, co-located fiber-optics to perform the regression measurement. The disparate optical transmission properties of the two fiber-optics makes it possible to measure the regression rate by monitoring the relative light attenuation through the fibers. As the fibers regress along with the parent material in which they are embedded, the relative light intensities through the two fibers changes, providing a measure of the regression rate. The optical nature of the system makes it relatively easy to use in a variety of harsh, high temperature environments, and it is also unaffected by the presence of electric and magnetic fields. In addition, the sensor could be used to perform optical spectroscopy on the light emitted by a process and collected by fibers, giving localized measurements of various properties. The capability to perform an in-situ measurement of material regression rates is useful in addressing a variety of physical issues in various applications. An in-situ measurement allows for real-time data regarding the erosion rates, providing a quick method for
Bayesian regression models for the estimation of net cost of disease using aggregate data.
Mitsakakis, Nicholas; Tomlinson, George
2015-01-23
Estimation of net costs attributed to a disease or other health condition is very important for health economists and policy makers. Skewness and heteroscedasticity are well-known characteristics for cost data, making linear models generally inappropriate and dictating the use of other types of models, such as gamma regression. Additional hurdles emerge when individual level data are not available. In this paper, we consider the latter case were data are only available at the aggregate level, containing means and standard deviations for different strata defined by a number of demographic and clinical factors. We summarize a number of methods that can be used for this estimation, and we propose a Bayesian approach that utilizes the sample stratum specific standard deviations as stochastic. We investigate the performance of two linear mixed models, comparing them with two proposed gamma regression mixed models, to analyze simulated data generated by gamma and log-normal distributions. Our proposed Bayesian approach seems to have significant advantages for net cost estimation when only aggregate data are available. The implemented gamma models do not seem to offer the expected benefits over the linear models; however, further investigation and refinement is needed.
Notes sur les mouvements recursifs (Notes on Regressive Moves).
ERIC Educational Resources Information Center
Auchlin, Antoine; And Others
1981-01-01
Examines the phenomenon of regressive moves (retro-interpretation) in the light of a hypothesis according to which the formation of complex and hierarchically organized conversation units is subordinated to the linearity of discourse. Analyzes a transactional exchange, describing the interplay of integration, anticipation, and retro-interpretation…
Meta-Regression Approximations to Reduce Publication Selection Bias
ERIC Educational Resources Information Center
Stanley, T. D.; Doucouliagos, Hristos
2014-01-01
Publication selection bias is a serious challenge to the integrity of all empirical sciences. We derive meta-regression approximations to reduce this bias. Our approach employs Taylor polynomial approximations to the conditional mean of a truncated distribution. A quadratic approximation without a linear term, precision-effect estimate with…
A Demonstration of Regression False Positive Selection in Data Mining
ERIC Educational Resources Information Center
Pinder, Jonathan P.
2014-01-01
Business analytics courses, such as marketing research, data mining, forecasting, and advanced financial modeling, have substantial predictive modeling components. The predictive modeling in these courses requires students to estimate and test many linear regressions. As a result, false positive variable selection ("type I errors") is…
Logarithmic Transformations in Regression: Do You Transform Back Correctly?
ERIC Educational Resources Information Center
Dambolena, Ismael G.; Eriksen, Steven E.; Kopcso, David P.
2009-01-01
The logarithmic transformation is often used in regression analysis for a variety of purposes such as the linearization of a nonlinear relationship between two or more variables. We have noticed that when this transformation is applied to the response variable, the computation of the point estimate of the conditional mean of the original response…
Kernel regression for fMRI pattern prediction
Chu, Carlton; Ni, Yizhao; Tan, Geoffrey; Saunders, Craig J.; Ashburner, John
2011-01-01
This paper introduces two kernel-based regression schemes to decode or predict brain states from functional brain scans as part of the Pittsburgh Brain Activity Interpretation Competition (PBAIC) 2007, in which our team was awarded first place. Our procedure involved image realignment, spatial smoothing, detrending of low-frequency drifts, and application of multivariate linear and non-linear kernel regression methods: namely kernel ridge regression (KRR) and relevance vector regression (RVR). RVR is based on a Bayesian framework, which automatically determines a sparse solution through maximization of marginal likelihood. KRR is the dual-form formulation of ridge regression, which solves regression problems with high dimensional data in a computationally efficient way. Feature selection based on prior knowledge about human brain function was also used. Post-processing by constrained deconvolution and re-convolution was used to furnish the prediction. This paper also contains a detailed description of how prior knowledge was used to fine tune predictions of specific “feature ratings,” which we believe is one of the key factors in our prediction accuracy. The impact of pre-processing was also evaluated, demonstrating that different pre-processing may lead to significantly different accuracies. Although the original work was aimed at the PBAIC, many techniques described in this paper can be generally applied to any fMRI decoding works to increase the prediction accuracy. PMID:20348000
Regressive evolution in Astyanax cavefish.
Jeffery, William R
2009-01-01
A diverse group of animals, including members of most major phyla, have adapted to life in the perpetual darkness of caves. These animals are united by the convergence of two regressive phenotypes, loss of eyes and pigmentation. The mechanisms of regressive evolution are poorly understood. The teleost Astyanax mexicanus is of special significance in studies of regressive evolution in cave animals. This species includes an ancestral surface dwelling form and many con-specific cave-dwelling forms, some of which have evolved their recessive phenotypes independently. Recent advances in Astyanax development and genetics have provided new information about how eyes and pigment are lost during cavefish evolution; namely, they have revealed some of the molecular and cellular mechanisms involved in trait modification, the number and identity of the underlying genes and mutations, the molecular basis of parallel evolution, and the evolutionary forces driving adaptation to the cave environment.
... or natural. Natural food additives include: Herbs or spices to add flavor to foods Vinegar for pickling ... Certain colors improve the appearance of foods. Many spices, as well as natural and man-made flavors, ...
Joint regression analysis of correlated data using Gaussian copulas.
Song, Peter X-K; Li, Mingyao; Yuan, Ying
2009-03-01
This article concerns a new joint modeling approach for correlated data analysis. Utilizing Gaussian copulas, we present a unified and flexible machinery to integrate separate one-dimensional generalized linear models (GLMs) into a joint regression analysis of continuous, discrete, and mixed correlated outcomes. This essentially leads to a multivariate analogue of the univariate GLM theory and hence an efficiency gain in the estimation of regression coefficients. The availability of joint probability models enables us to develop a full maximum likelihood inference. Numerical illustrations are focused on regression models for discrete correlated data, including multidimensional logistic regression models and a joint model for mixed normal and binary outcomes. In the simulation studies, the proposed copula-based joint model is compared to the popular generalized estimating equations, which is a moment-based estimating equation method to join univariate GLMs. Two real-world data examples are used in the illustration.
Multiple Regression: A Leisurely Primer.
ERIC Educational Resources Information Center
Daniel, Larry G.; Onwuegbuzie, Anthony J.
Multiple regression is a useful statistical technique when the researcher is considering situations in which variables of interest are theorized to be multiply caused. It may also be useful in those situations in which the researchers is interested in studies of predictability of phenomena of interest. This paper provides an introduction to…
Weighting Regressions by Propensity Scores
ERIC Educational Resources Information Center
Freedman, David A.; Berk, Richard A.
2008-01-01
Regressions can be weighted by propensity scores in order to reduce bias. However, weighting is likely to increase random error in the estimates, and to bias the estimated standard errors downward, even when selection mechanisms are well understood. Moreover, in some cases, weighting will increase the bias in estimated causal parameters. If…
Quantile Regression with Censored Data
ERIC Educational Resources Information Center
Lin, Guixian
2009-01-01
The Cox proportional hazards model and the accelerated failure time model are frequently used in survival data analysis. They are powerful, yet have limitation due to their model assumptions. Quantile regression offers a semiparametric approach to model data with possible heterogeneity. It is particularly powerful for censored responses, where the…
Colgate, S.A.
1958-05-27
An improvement is presented in linear accelerators for charged particles with respect to the stable focusing of the particle beam. The improvement consists of providing a radial electric field transverse to the accelerating electric fields and angularly introducing the beam of particles in the field. The results of the foregoing is to achieve a beam which spirals about the axis of the acceleration path. The combination of the electric fields and angular motion of the particles cooperate to provide a stable and focused particle beam.
Neither fixed nor random: weighted least squares meta-regression.
Stanley, T D; Doucouliagos, Hristos
2017-03-01
Our study revisits and challenges two core conventional meta-regression estimators: the prevalent use of 'mixed-effects' or random-effects meta-regression analysis and the correction of standard errors that defines fixed-effects meta-regression analysis (FE-MRA). We show how and explain why an unrestricted weighted least squares MRA (WLS-MRA) estimator is superior to conventional random-effects (or mixed-effects) meta-regression when there is publication (or small-sample) bias that is as good as FE-MRA in all cases and better than fixed effects in most practical applications. Simulations and statistical theory show that WLS-MRA provides satisfactory estimates of meta-regression coefficients that are practically equivalent to mixed effects or random effects when there is no publication bias. When there is publication selection bias, WLS-MRA always has smaller bias than mixed effects or random effects. In practical applications, an unrestricted WLS meta-regression is likely to give practically equivalent or superior estimates to fixed-effects, random-effects, and mixed-effects meta-regression approaches. However, random-effects meta-regression remains viable and perhaps somewhat preferable if selection for statistical significance (publication bias) can be ruled out and when random, additive normal heterogeneity is known to directly affect the 'true' regression coefficient. Copyright © 2016 John Wiley & Sons, Ltd.
Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors
Woodard, Dawn B.; Crainiceanu, Ciprian; Ruppert, David
2013-01-01
We propose a new method for regression using a parsimonious and scientifically interpretable representation of functional predictors. Our approach is designed for data that exhibit features such as spikes, dips, and plateaus whose frequency, location, size, and shape varies stochastically across subjects. We propose Bayesian inference of the joint functional and exposure models, and give a method for efficient computation. We contrast our approach with existing state-of-the-art methods for regression with functional predictors, and show that our method is more effective and efficient for data that include features occurring at varying locations. We apply our methodology to a large and complex dataset from the Sleep Heart Health Study, to quantify the association between sleep characteristics and health outcomes. Software and technical appendices are provided in online supplemental materials. PMID:24293988
Monthly streamflow forecasting using Gaussian Process Regression
NASA Astrophysics Data System (ADS)
Sun, Alexander Y.; Wang, Dingbao; Xu, Xianli
2014-04-01
Streamflow forecasting plays a critical role in nearly all aspects of water resources planning and management. In this work, Gaussian Process Regression (GPR), an effective kernel-based machine learning algorithm, is applied to probabilistic streamflow forecasting. GPR is built on Gaussian process, which is a stochastic process that generalizes multivariate Gaussian distribution to infinite-dimensional space such that distributions over function values can be defined. The GPR algorithm provides a tractable and flexible hierarchical Bayesian framework for inferring the posterior distribution of streamflows. The prediction skill of the algorithm is tested for one-month-ahead prediction using the MOPEX database, which includes long-term hydrometeorological time series collected from 438 basins across the U.S. from 1948 to 2003. Comparisons with linear regression and artificial neural network models indicate that GPR outperforms both regression methods in most cases. The GPR prediction of MOPEX basins is further examined using the Budyko framework, which helps to reveal the close relationships among water-energy partitions, hydrologic similarity, and predictability. Flow regime modification and the resulting loss of predictability have been a major concern in recent years because of climate change and anthropogenic activities. The persistence of streamflow predictability is thus examined by extending the original MOPEX data records to 2012. Results indicate relatively strong persistence of streamflow predictability in the extended period, although the low-predictability basins tend to show more variations. Because many low-predictability basins are located in regions experiencing fast growth of human activities, the significance of sustainable development and water resources management can be even greater for those regions.
Background stratified Poisson regression analysis of cohort data.
Richardson, David B; Langholz, Bryan
2012-03-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
Quantile Regression Models for Current Status Data.
Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen
2016-11-01
Current status data arise frequently in demography, epidemiology, and econometrics where the exact failure time cannot be determined but is only known to have occurred before or after a known observation time. We propose a quantile regression model to analyze current status data, because it does not require distributional assumptions and the coefficients can be interpreted as direct regression effects on the distribution of failure time in the original time scale. Our model assumes that the conditional quantile of failure time is a linear function of covariates. We assume conditional independence between the failure time and observation time. An M-estimator is developed for parameter estimation which is computed using the concave-convex procedure and its confidence intervals are constructed using a subsampling method. Asymptotic properties for the estimator are derived and proven using modern empirical process theory. The small sample performance of the proposed method is demonstrated via simulation studies. Finally, we apply the proposed method to analyze data from the Mayo Clinic Study of Aging.
Shape regression for vertebra fracture quantification
NASA Astrophysics Data System (ADS)
Lund, Michael Tillge; de Bruijne, Marleen; Tanko, Laszlo B.; Nielsen, Mads
2005-04-01
Accurate and reliable identification and quantification of vertebral fractures constitute a challenge both in clinical trials and in diagnosis of osteoporosis. Various efforts have been made to develop reliable, objective, and reproducible methods for assessing vertebral fractures, but at present there is no consensus concerning a universally accepted diagnostic definition of vertebral fractures. In this project we want to investigate whether or not it is possible to accurately reconstruct the shape of a normal vertebra, using a neighbouring vertebra as prior information. The reconstructed shape can then be used to develop a novel vertebra fracture measure, by comparing the segmented vertebra shape with its reconstructed normal shape. The vertebrae in lateral x-rays of the lumbar spine were manually annotated by a medical expert. With this dataset we built a shape model, with equidistant point distribution between the four corner points. Based on the shape model, a multiple linear regression model of a normal vertebra shape was developed for each dataset using leave-one-out cross-validation. The reconstructed shape was calculated for each dataset using these regression models. The average prediction error for the annotated shape was on average 3%.
NASA Technical Reports Server (NTRS)
2006-01-01
[figure removed for brevity, see original site] Context image for PIA03667 Linear Clouds
These clouds are located near the edge of the south polar region. The cloud tops are the puffy white features in the bottom half of the image.
Image information: VIS instrument. Latitude -80.1N, Longitude 52.1E. 17 meter/pixel resolution.
Note: this THEMIS visual image has not been radiometrically nor geometrically calibrated for this preliminary release. An empirical correction has been performed to remove instrumental effects. A linear shift has been applied in the cross-track and down-track direction to approximate spacecraft and planetary motion. Fully calibrated and geometrically projected images will be released through the Planetary Data System in accordance with Project policies at a later time.
NASA's Jet Propulsion Laboratory manages the 2001 Mars Odyssey mission for NASA's Office of Space Science, Washington, D.C. The Thermal Emission Imaging System (THEMIS) was developed by Arizona State University, Tempe, in collaboration with Raytheon Santa Barbara Remote Sensing. The THEMIS investigation is led by Dr. Philip Christensen at Arizona State University. Lockheed Martin Astronautics, Denver, is the prime contractor for the Odyssey project, and developed and built the orbiter. Mission operations are conducted jointly from Lockheed Martin and from JPL, a division of the California Institute of Technology in Pasadena.
Deep Wavelet Scattering for Quantum Energy Regression
NASA Astrophysics Data System (ADS)
Hirn, Matthew
Physical functionals are usually computed as solutions of variational problems or from solutions of partial differential equations, which may require huge computations for complex systems. Quantum chemistry calculations of ground state molecular energies is such an example. Indeed, if x is a quantum molecular state, then the ground state energy E0 (x) is the minimum eigenvalue solution of the time independent Schrödinger Equation, which is computationally intensive for large systems. Machine learning algorithms do not simulate the physical system but estimate solutions by interpolating values provided by a training set of known examples {(xi ,E0 (xi) } i <= n . However, precise interpolations may require a number of examples that is exponential in the system dimension, and are thus intractable. This curse of dimensionality may be circumvented by computing interpolations in smaller approximation spaces, which take advantage of physical invariants. Linear regressions of E0 over a dictionary Φ ={ϕk } k compute an approximation E 0 as: E 0 (x) =∑kwkϕk (x) , where the weights {wk } k are selected to minimize the error between E0 and E 0 on the training set. The key to such a regression approach then lies in the design of the dictionary Φ. It must be intricate enough to capture the essential variability of E0 (x) over the molecular states x of interest, while simple enough so that evaluation of Φ (x) is significantly less intensive than a direct quantum mechanical computation (or approximation) of E0 (x) . In this talk we present a novel dictionary Φ for the regression of quantum mechanical energies based on the scattering transform of an intermediate, approximate electron density representation ρx of the state x. The scattering transform has the architecture of a deep convolutional network, composed of an alternating sequence of linear filters and nonlinear maps. Whereas in many deep learning tasks the linear filters are learned from the training data, here
Regression Verification Using Impact Summaries
NASA Technical Reports Server (NTRS)
Backes, John; Person, Suzette J.; Rungta, Neha; Thachuk, Oksana
2013-01-01
Regression verification techniques are used to prove equivalence of syntactically similar programs. Checking equivalence of large programs, however, can be computationally expensive. Existing regression verification techniques rely on abstraction and decomposition techniques to reduce the computational effort of checking equivalence of the entire program. These techniques are sound but not complete. In this work, we propose a novel approach to improve scalability of regression verification by classifying the program behaviors generated during symbolic execution as either impacted or unimpacted. Our technique uses a combination of static analysis and symbolic execution to generate summaries of impacted program behaviors. The impact summaries are then checked for equivalence using an o-the-shelf decision procedure. We prove that our approach is both sound and complete for sequential programs, with respect to the depth bound of symbolic execution. Our evaluation on a set of sequential C artifacts shows that reducing the size of the summaries can help reduce the cost of software equivalence checking. Various reduction, abstraction, and compositional techniques have been developed to help scale software verification techniques to industrial-sized systems. Although such techniques have greatly increased the size and complexity of systems that can be checked, analysis of large software systems remains costly. Regression analysis techniques, e.g., regression testing [16], regression model checking [22], and regression verification [19], restrict the scope of the analysis by leveraging the differences between program versions. These techniques are based on the idea that if code is checked early in development, then subsequent versions can be checked against a prior (checked) version, leveraging the results of the previous analysis to reduce analysis cost of the current version. Regression verification addresses the problem of proving equivalence of closely related program
Chen, Guo-Bo
2014-01-01
Exploring heritability of complex traits is a central focus of statistical genetics. Among various previously proposed methods to estimate heritability, variance component methods are advantageous when estimating heritability using markers. Due to the high-dimensional nature of data obtained from genome-wide association studies (GWAS) in which genetic architecture is often unknown, the most appropriate heritability estimator model is often unclear. The Haseman–Elston (HE) regression is a variance component method that was initially only proposed for linkage studies. However, this study presents a theoretical basis for a modified HE that models linkage disequilibrium for a quantitative trait, and consequently can be used for GWAS. After replacing identical by descent (IBD) scores with identity by state (IBS) scores, we applied the IBS-based HE regression to single-marker association studies (scenario I) and estimated the variance component using multiple markers (scenario II). In scenario II, we discuss the circumstances in which the HE regression and the mixed linear model are equivalent; the disparity between these two methods is observed when a covariance component exists for the additive variance. When we extended the IBS-based HE regression to case-control studies in a subsequent simulation study, we found that it provided a nearly unbiased estimate of heritability, more precise than that estimated via the mixed linear model. Thus, for the case-control scenario, the HE regression is preferable. GEnetic Analysis Repository (GEAR; http://sourceforge.net/p/gbchen/wiki/GEAR/) software implemented the HE regression method and is freely available. PMID:24817879
Rudolf Keller
2004-08-10
In this project, a concept to improve the performance of aluminum production cells by introducing potlining additives was examined and tested. Boron oxide was added to cathode blocks, and titanium was dissolved in the metal pool; this resulted in the formation of titanium diboride and caused the molten aluminum to wet the carbonaceous cathode surface. Such wetting reportedly leads to operational improvements and extended cell life. In addition, boron oxide suppresses cyanide formation. This final report presents and discusses the results of this project. Substantial economic benefits for the practical implementation of the technology are projected, especially for modern cells with graphitized blocks. For example, with an energy savings of about 5% and an increase in pot life from 1500 to 2500 days, a cost savings of $ 0.023 per pound of aluminum produced is projected for a 200 kA pot.
Harrup, Mason K; Rollins, Harry W
2013-11-26
An additive comprising a phosphazene compound that has at least two reactive functional groups and at least one capping functional group bonded to phosphorus atoms of the phosphazene compound. One of the at least two reactive functional groups is configured to react with cellulose and the other of the at least two reactive functional groups is configured to react with a resin, such as an amine resin of a polycarboxylic acid resin. The at least one capping functional group is selected from the group consisting of a short chain ether group, an alkoxy group, or an aryloxy group. Also disclosed are an additive-resin admixture, a method of treating a wood product, and a wood product.
Embedded Sensors for Measuring Surface Regression
NASA Technical Reports Server (NTRS)
Gramer, Daniel J.; Taagen, Thomas J.; Vermaak, Anton G.
2006-01-01
non-eroding end of the sensor. The sensor signal can be transmitted from inside a high-pressure chamber to the ambient environment, using commercially available feedthrough connectors. Miniaturized internal recorders or wireless data transmission could also potentially be employed to eliminate the need for producing penetrations in the chamber case. The rungs are designed so that as each successive rung is eroded away, the resistance changes by an amount that yields a readily measurable signal larger than the background noise. (In addition, signal-conditioning techniques are used in processing the resistance readings to mitigate the effect of noise.) Hence, each discrete change of resistance serves to indicate the arrival of the regressing host material front at the known depth of the affected resistor rung. The average rate of regression between two adjacent resistors can be calculated simply as the distance between the resistors divided by the time interval between their resistance jumps. Advanced data reduction techniques have also been developed to establish the instantaneous surface position and regression rate when the regressing front is between rungs.
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local
Random regression models using different functions to model milk flow in dairy cows.
Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G
2014-09-12
We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.
NASA Astrophysics Data System (ADS)
Jordens, Kurt
1999-12-01
The sol-gel process has been employed to generate hybrid inorganic-organic network materials. Unique ceramers were prepared based on an alkoxysilane functionalized soft organic oligomer, poly(propylene oxide (PPO), and tetramethoxysilane (TMOS). Despite the formation of covalent bonds between the inorganic and organic constituents, the resulting network materials were phase separated, composed of a silicate rich phase embedded in a matrix of the organic oligomer chains. The behavior of such materials was similar to elastomers containing a reinforcing filler. The study focused on the influence of initial oligomer molecular weight, functionality, and tetramethoxysilane, water, and acid catalyst content on the final structure, mechanical and thermal properties. The sol-gel approach has also been exploited to generate thin, transparent, abrasion resistant coatings for metal substrates. These systems were based on alkoxysilane functionalized diethylenetriamine (DETA) with TMOS, which generated hybrid networks with very high crosslink densities. These materials were applied with great success as abrasion resistant coatings to aluminum, copper, brass, and stainless steel. In another study, intercalated polymer-clay nanocomposites were prepared based on various epoxy networks montmorillonite clay. This work explored the influence of incorporated clay on the adhesive properties of the epoxies. The lap shear strength decreased with increasing day content This was due to a reduction in the toughness of the epoxy. Also, the delaminated (or exfoliated) nanocomposite structure could not be generated. Instead, all nanocomposite systems possessed an intercalated structure. The final project involved the characterization of a series of metallocene catalyzed linear polyethylenes, produced at Phillips Petroleum. Polyolefins synthesized with such new catalyst systems are becoming widely available. The influence of molecular weight and thermal treatment on the mechanical, rheological
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2016-01-01
Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889
Estimating effects of limiting factors with regression quantiles
Cade, B.S.; Terrell, J.W.; Schroeder, R.L.
1999-01-01
In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e
Supplier Selection Using Weighted Utility Additive Method
NASA Astrophysics Data System (ADS)
Karande, Prasad; Chakraborty, Shankar
2015-10-01
Supplier selection is a multi-criteria decision-making (MCDM) problem which mainly involves evaluating a number of available suppliers according to a set of common criteria for choosing the best one to meet the organizational needs. For any manufacturing or service organization, selecting the right upstream suppliers is a key success factor that will significantly reduce purchasing cost, increase downstream customer satisfaction and improve competitive ability. The past researchers have attempted to solve the supplier selection problem employing different MCDM techniques which involve active participation of the decision makers in the decision-making process. This paper deals with the application of weighted utility additive (WUTA) method for solving supplier selection problems. The WUTA method, an extension of utility additive approach, is based on ordinal regression and consists of building a piece-wise linear additive decision model from a preference structure using linear programming (LP). It adopts preference disaggregation principle and addresses the decision-making activities through operational models which need implicit preferences in the form of a preorder of reference alternatives or a subset of these alternatives present in the process. The preferential preorder provided by the decision maker is used as a restriction of a LP problem, which has its own objective function, minimization of the sum of the errors associated with the ranking of each alternative. Based on a given reference ranking of alternatives, one or more additive utility functions are derived. Using these utility functions, the weighted utilities for individual criterion values are combined into an overall weighted utility for a given alternative. It is observed that WUTA method, having a sound mathematical background, can provide accurate ranking to the candidate suppliers and choose the best one to fulfill the organizational requirements. Two real time examples are illustrated to prove
ERIC Educational Resources Information Center
Jiang, Ying Hong; Smith, Philip L.
This Monte Carlo study explored relationships among standard and unstandardized regression coefficients, structural coefficients, multiple R_ squared, and significance level of predictors for a variety of linear regression scenarios. Ten regression models with three predictors were included, and four conditions were varied that were expected to…
Bowlby, Heather D; Gibson, A Jamie F
2015-01-01
Describing how population-level survival rates are influenced by environmental change becomes necessary during recovery planning to identify threats that should be the focus for future remediation efforts. However, the ways in which data are analyzed have the potential to change our ecological understanding and thus subsequent recommendations for remedial actions to address threats. In regression, distributional assumptions underlying short time series of survival estimates cannot be investigated a priori and data likely contain points that do not follow the general trend (outliers) as well as contain additional variation relative to an assumed distribution (overdispersion). Using juvenile survival data from three endangered Atlantic salmon Salmo salar L. populations in response to hydrological variation, four distributions for the response were compared using lognormal and generalized linear models (GLM). The influence of outliers as well as overdispersion was investigated by comparing conclusions from robust regressions with these lognormal models and GLMs. The analyses strongly supported the use of a lognormal distribution for survival estimates (i.e., modeling the instantaneous rate of mortality as the response) and would have led to ambiguity in the identification of significant hydrological predictors as well as low overall confidence in the predicted relationships if only GLMs had been considered. However, using robust regression to evaluate the effect of additional variation and outliers in the data relative to regression assumptions resulted in a better understanding of relationships between hydrological variables and survival that could be used for population-specific recovery planning. This manuscript highlights how a systematic analysis that explicitly considers what monitoring data represent and where variation is likely to come from is required in order to draw meaningful conclusions when analyzing changes in survival relative to environmental
Regression analysis of cytopathological data
Whittemore, A.S.; McLarty, J.W.; Fortson, N.; Anderson, K.
1982-12-01
Epithelial cells from the human body are frequently labelled according to one of several ordered levels of abnormality, ranging from normal to malignant. The label of the most abnormal cell in a specimen determines the score for the specimen. This paper presents a model for the regression of specimen scores against continuous and discrete variables, as in host exposure to carcinogens. Application to data and tests for adequacy of model fit are illustrated using sputum specimens obtained from a cohort of former asbestos workers.
ERIC Educational Resources Information Center
Lopez Alonso, A. O.
A linear relationship was found between judgements given by 160 subjects to 7 objects presented as single stimuli (alpha judgements) and judgements given to the same objects presented with a condition (gamma judgements). This relationship holds for alpha judgements and the gamma judgements that belong to a family of constant stimulus and varying…
Buonaccorsi, John; Prochenka, Agnieszka; Thoresen, Magne; Ploski, Rafal
2016-09-30
Motivated by a genetic application, this paper addresses the problem of fitting regression models when the predictor is a proportion measured with error. While the problem of dealing with additive measurement error in fitting regression models has been extensively studied, the problem where the additive error is of a binomial nature has not been addressed. The measurement errors here are heteroscedastic for two reasons; dependence on the underlying true value and changing sampling effort over observations. While some of the previously developed methods for treating additive measurement error with heteroscedasticity can be used in this setting, other methods need modification. A new version of simulation extrapolation is developed, and we also explore a variation on the standard regression calibration method that uses a beta-binomial model based on the fact that the true value is a proportion. Although most of the methods introduced here can be used for fitting non-linear models, this paper will focus primarily on their use in fitting a linear model. While previous work has focused mainly on estimation of the coefficients, we will, with motivation from our example, also examine estimation of the variance around the regression line. In addressing these problems, we also discuss the appropriate manner in which to bootstrap for both inferences and bias assessment. The various methods are compared via simulation, and the results are illustrated using our motivating data, for which the goal is to relate the methylation rate of a blood sample to the age of the individual providing the sample. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Guptill, James D.; Hopkins, Dale A.; Lavelle, Thomas M.
2000-01-01
The NASA Engine Performance Program (NEPP) can configure and analyze almost any type of gas turbine engine that can be generated through the interconnection of a set of standard physical components. In addition, the code can optimize engine performance by changing adjustable variables under a set of constraints. However, for engine cycle problems at certain operating points, the NEPP code can encounter difficulties: nonconvergence in the currently implemented Powell's optimization algorithm and deficiencies in the Newton-Raphson solver during engine balancing. A project was undertaken to correct these deficiencies. Nonconvergence was avoided through a cascade optimization strategy, and deficiencies associated with engine balancing were eliminated through neural network and linear regression methods. An approximation-interspersed cascade strategy was used to optimize the engine's operation over its flight envelope. Replacement of Powell's algorithm by the cascade strategy improved the optimization segment of the NEPP code. The performance of the linear regression and neural network methods as alternative engine analyzers was found to be satisfactory. This report considers two examples-a supersonic mixed-flow turbofan engine and a subsonic waverotor-topped engine-to illustrate the results, and it discusses insights gained from the improved version of the NEPP code.
Relativistic Linear Restoring Force
ERIC Educational Resources Information Center
Clark, D.; Franklin, J.; Mann, N.
2012-01-01
We consider two different forms for a relativistic version of a linear restoring force. The pair comes from taking Hooke's law to be the force appearing on the right-hand side of the relativistic expressions: d"p"/d"t" or d"p"/d["tau"]. Either formulation recovers Hooke's law in the non-relativistic limit. In addition to these two forces, we…
Crawford, John R; Garthwaite, Paul H; Denham, Annie K; Chelune, Gordon J
2012-12-01
Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because (a) not all psychologists are aware that regression equations can be built not only from raw data but also using only basic summary data for a sample, and (b) the computations involved are tedious and prone to error. In an attempt to overcome these barriers, Crawford and Garthwaite (2007) provided methods to build and apply simple linear regression models using summary statistics as data. In the present study, we extend this work to set out the steps required to build multiple regression models from sample summary statistics and the further steps required to compute the associated statistics for drawing inferences concerning an individual case. We also develop, describe, and make available a computer program that implements these methods. Although there are caveats associated with the use of the methods, these need to be balanced against pragmatic considerations and against the alternative of either entirely ignoring a pertinent data set or using it informally to provide a clinical "guesstimate." Upgraded versions of earlier programs for regression in the single case are also provided; these add the point and interval estimates of effect size developed in the present article.
Multiatlas segmentation as nonparametric regression.
Awate, Suyash P; Whitaker, Ross T
2014-09-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.
Multiatlas Segmentation as Nonparametric Regression
Awate, Suyash P.; Whitaker, Ross T.
2015-01-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator’s convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems. PMID:24802528
Relationships of Measurement Error and Prediction Error in Observed-Score Regression
ERIC Educational Resources Information Center
Moses, Tim
2012-01-01
The focus of this paper is assessing the impact of measurement errors on the prediction error of an observed-score regression. Measures are presented and described for decomposing the linear regression's prediction error variance into parts attributable to the true score variance and the error variances of the dependent variable and the predictor…
ERIC Educational Resources Information Center
Ker, H. W.
2014-01-01
Multilevel data are very common in educational research. Hierarchical linear models/linear mixed-effects models (HLMs/LMEs) are often utilized to analyze multilevel data nowadays. This paper discusses the problems of utilizing ordinary regressions for modeling multilevel educational data, compare the data analytic results from three regression…
Hrouzek, Pavel; Štys, Dalibor; Martens, Harald
2013-01-01
Responsivity is a conversion qualification of a measurement device given by the functional dependence between the input and output quantities. A concentration-response-dependent calibration curve represents the most simple experiment for the measurement of responsivity in mass spectrometry. The cyanobacterial hepatotoxin microcystin-LR content in complex biological matrices of food additives was chosen as a model example of a typical problem. The calibration curves for pure microcystin and its mixtures with extracts of green alga and fish meat were reconstructed from the series of measurement. A novel approach for the quantitative estimation of ion competition in ESI is proposed in this paper. We define the correlated responsivity offset in the intensity values using the approximation of minimal correlation given by the matrix to the target mass values of the analyte. The estimation of the matrix influence enables the approximation of the position of a priori unknown responsivity and was easily evaluated using a simple algorithm. The method itself is directly derived from the basic attributes of the theory of measurements. There is sufficient agreement between the theoretical and experimental values. However, some theoretical issues are discussed to avoid misinterpretations and excessive expectations. PMID:23586036
Shell Element Verification & Regression Problems for DYNA3D
Zywicz, E
2008-02-01
A series of quasi-static regression/verification problems were developed for the triangular and quadrilateral shell element formulations contained in Lawrence Livermore National Laboratory's explicit finite element program DYNA3D. Each regression problem imposes both displacement- and force-type boundary conditions to probe the five independent nodal degrees of freedom employed in the targeted formulation. When applicable, the finite element results are compared with small-strain linear-elastic closed-form reference solutions to verify select aspects of the formulations implementation. Although all problems in the suite depict the same geometry, material behavior, and loading conditions, each problem represents a unique combination of shell formulation, stabilization method, and integration rule. Collectively, the thirty-six new regression problems in the test suite cover nine different shell formulations, three hourglass stabilization methods, and three families of through-thickness integration rules.
Regression with repeated measures in the experimental units.
Garsd, A
1999-01-01
The most satisfactory solution to the problem of modeling a family of regressions with repeated measures in the experimental units is multivariate in nature. However, multivariate methods are difficult to follow and implement. Furthermore, by keeping the focus on the experimental unit, a family of simple univariate linear models will often parallel both the investigator's intuitive grasp of the statistical task at hand. We present two examples based on data from a study of the suckling stimulus during breastfeeding in newborn infants. We show how a family of regression lines can provide useful, if approximate, answers to the questions of interest. One example involves a regression setting proper and the other a typical case of correlation. We discuss alternative univariate models that may be useful for this type of problems.
Nonparametric supervised learning by linear interpolation with maximum entropy.
Gupta, Maya R; Gray, Robert M; Olshen, Richard A
2006-05-01
Nonparametric neighborhood methods for learning entail estimation of class conditional probabilities based on relative frequencies of samples that are "near-neighbors" of a test point. We propose and explore the behavior of a learning algorithm that uses linear interpolation and the principle of maximum entropy (LIME). We consider some theoretical properties of the LIME algorithm: LIME weights have exponential form; the estimates are consistent; and the estimates are robust to additive noise. In relation to bias reduction, we show that near-neighbors contain a test point in their convex hull asymptotically. The common linear interpolation solution used for regression on grids or look-up-tables is shown to solve a related maximum entropy problem. LIME simulation results support use of the method, and performance on a pipeline integrity classification problem demonstrates that the proposed algorithm has practical value.
Recognition of caudal regression syndrome.
Boulas, Mari M
2009-04-01
Caudal regression syndrome, also referred to as caudal dysplasia and sacral agenesis syndrome, is a rare congenital malformation characterized by varying degrees of developmental failure early in gestation. It involves the lower extremities, the lumbar and coccygeal vertebrae, and corresponding segments of the spinal cord. This is a rare disorder, and true pathogenesis is unclear. The etiology is thought to be related to maternal diabetes, genetic predisposition, and vascular hypoperfusion, but no true causative factor has been determined. Fetal diagnostic tools allow for early recognition of the syndrome, and careful examination of the newborn is essential to determine the extent of the disorder. Associated organ system dysfunction depends on the severity of the disease. Related defects are structural, and systematic problems including respiratory, cardiac, gastrointestinal, urinary, orthopedic, and neurologic can be present in varying degrees of severity and in different combinations. A multidisciplinary approach to management is crucial. Because the primary pathology is irreversible, treatment is only supportive.
Lumbar herniated disc: spontaneous regression
Yüksel, Kasım Zafer
2017-01-01
Background Low back pain is a frequent condition that results in substantial disability and causes admission of patients to neurosurgery clinics. To evaluate and present the therapeutic outcomes in lumbar disc hernia (LDH) patients treated by means of a conservative approach, consisting of bed rest and medical therapy. Methods This retrospective cohort was carried out in the neurosurgery departments of hospitals in Kahramanmaraş city and 23 patients diagnosed with LDH at the levels of L3−L4, L4−L5 or L5−S1 were enrolled. Results The average age was 38.4 ± 8.0 and the chief complaint was low back pain and sciatica radiating to one or both lower extremities. Conservative treatment was administered. Neurological examination findings, durations of treatment and intervals until symptomatic recovery were recorded. Laségue tests and neurosensory examination revealed that mild neurological deficits existed in 16 of our patients. Previously, 5 patients had received physiotherapy and 7 patients had been on medical treatment. The number of patients with LDH at the level of L3−L4, L4−L5, and L5−S1 were 1, 13, and 9, respectively. All patients reported that they had benefit from medical treatment and bed rest, and radiologic improvement was observed simultaneously on MRI scans. The average duration until symptomatic recovery and/or regression of LDH symptoms was 13.6 ± 5.4 months (range: 5−22). Conclusions It should be kept in mind that lumbar disc hernias could regress with medical treatment and rest without surgery, and there should be an awareness that these patients could recover radiologically. This condition must be taken into account during decision making for surgical intervention in LDH patients devoid of indications for emergent surgery. PMID:28119770
Scientific Progress or Regress in Sports Physiology?
Böning, Dieter
2016-11-01
In modern societies there is strong belief in scientific progress, but, unfortunately, a parallel partial regress occurs because of often avoidable mistakes. Mistakes are mainly forgetting, erroneous theories, errors in experiments and manuscripts, prejudice, selected publication of "positive" results, and fraud. An example of forgetting is that methods introduced decades ago are used without knowing the underlying theories: Basic articles are no longer read or cited. This omission may cause incorrect interpretation of results. For instance, false use of actual base excess instead of standard base excess for calculation of the number of hydrogen ions leaving the muscles raised the idea that an unknown fixed acid is produced in addition to lactic acid during exercise. An erroneous theory led to the conclusion that lactate is not the anion of a strong acid but a buffer. Mistakes occur after incorrect application of a method, after exclusion of unwelcome values, during evaluation of measurements by false calculations, or during preparation of manuscripts. Co-authors, as well as reviewers, do not always carefully read papers before publication. Peer reviewers might be biased against a hypothesis or an author. A general problem is selected publication of positive results. An example of fraud in sports medicine is the presence of doped subjects in groups of investigated athletes. To reduce regress, it is important that investigators search both original and recent articles on a topic and conscientiously examine the data. All co-authors and reviewers should read the text thoroughly and inspect all tables and figures in a manuscript.
A secure distributed logistic regression protocol for the detection of rare adverse drug events
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat
2013-01-01
Background There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. Objective To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. Methods We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. Results The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. Conclusion The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for
Jamrozik, J; McGrath, S; Kemp, R A; Miller, S P
2013-08-01
Stayability to consecutive calvings was selected as a measure of cow longevity in the Canadian Simmental population. Calving performance data on 188,579 cows and culling information from the Total Herd Reporting System were used to determine whether a cow stayed in a herd for her second and later (up to the eighth) calvings, given that she had calved as 2 yr old. Binary records (n = 1,164,319) were analyzed with animal linear and threshold models including fixed effects of year of birth by season of birth by parity number and age of cow at first calving by parity number and random effects of contemporary group (CG) defined as herd of birth within year by season, animal additive genetic effect, and a cow permanent environmental (PE) effect. All random effects were Legendre polynomial regressions of the same order, defined on the scale from second to the eighth calving. Bayesian methods with Gibbs sampling were used to estimate covariance components and genetic parameters for random effects of models and selected variables on the longitudinal scale. Bayes factors and analyses of mean squared error and correlation between observed and predicted observations indicated that the linear model with regressions of order 3 was most plausible for generating the current data compared with a fixed regression and other random regression (both linear and threshold) models of order up to 4. Estimates of variances for all random effects from the best fitting model changed with the calving number. Estimates of heritability decreased in time: from 0.35 (SD = 0.006) for stayability to second calving to 0.13 (SD = 0.004) for stayability to the eighth calving. Variance due to PE effect constituted the largest part of the total variance of stayability for all longitudinal points followed by genetic and CG components. Genetic effects of stayability to different calvings were relatively highly correlated, from 0.62 (SD = 0.011) to 0.99 (SD = 0.001), and correlation decreased with the time
Early development and regression in Rett syndrome.
Lee, J Y L; Leonard, H; Piek, J P; Downs, J
2013-12-01
This study utilized developmental profiling to examine symptoms in 14 girls with genetically confirmed Rett syndrome and whose families were participating in the Australian Rett syndrome or InterRett database. Regression was mostly characterized by loss of hand and/or communication skills (13/14) except one girl demonstrated slowing of skill development. Social withdrawal and inconsolable crying often developed simultaneously (9/14), with social withdrawal for shorter duration than inconsolable crying. Previously acquired gross motor skills declined in just over half of the sample (8/14), mostly observed as a loss of balance. Early abnormalities such as vomiting and strabismus were also seen. Our findings provide additional insight into the early clinical profile of Rett syndrome.
Abundant Inverse Regression using Sufficient Reduction and its Applications
Kim, Hyunwoo J.; Smith, Brandon M.; Adluru, Nagesh; Dyer, Charles R.; Johnson, Sterling C.; Singh, Vikas
2016-01-01
Statistical models such as linear regression drive numerous applications in computer vision and machine learning. The landscape of practical deployments of these formulations is dominated by forward regression models that estimate the parameters of a function mapping a set of p covariates, x, to a response variable, y. The less known alternative, Inverse Regression, offers various benefits that are much less explored in vision problems. The goal of this paper is to show how Inverse Regression in the “abundant” feature setting (i.e., many subsets of features are associated with the target label or response, as is the case for images), together with a statistical construction called Sufficient Reduction, yields highly flexible models that are a natural fit for model estimation tasks in vision. Specifically, we obtain formulations that provide relevance of individual covariates used in prediction, at the level of specific examples/samples — in a sense, explaining why a particular prediction was made. With no compromise in performance relative to other methods, an ability to interpret why a learning algorithm is behaving in a specific way for each prediction, adds significant value in numerous applications. We illustrate these properties and the benefits of Abundant Inverse Regression (AIR) on three distinct applications. PMID:27796010
Quantile regression provides a fuller analysis of speed data.
Hewson, Paul
2008-03-01
Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a "community speed watch" programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated.
copCAR: A Flexible Regression Model for Areal Data.
Hughes, John
2015-09-16
Non-Gaussian spatial data are common in many fields. When fitting regressions for such data, one needs to account for spatial dependence to ensure reliable inference for the regression coefficients. The two most commonly used regression models for spatially aggregated data are the automodel and the areal generalized linear mixed model (GLMM). These models induce spatial dependence in different ways but share the smoothing approach, which is intuitive but problematic. This article develops a new regression model for areal data. The new model is called copCAR because it is copula-based and employs the areal GLMM's conditional autoregression (CAR). copCAR overcomes many of the drawbacks of the automodel and the areal GLMM. Specifically, copCAR (1) is flexible and intuitive, (2) permits positive spatial dependence for all types of data, (3) permits efficient computation, and (4) provides reliable spatial regression inference and information about dependence strength. An implementation is provided by R package copCAR, which is available from the Comprehensive R Archive Network, and supplementary materials are available online.
Regression Model Optimization for the Analysis of Experimental Data
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins
NASA Astrophysics Data System (ADS)
Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.
2016-12-01
Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
NUCLEI SEGMENTATION VIA SPARSITY CONSTRAINED CONVOLUTIONAL REGRESSION
Zhou, Yin; Chang, Hang; Barner, Kenneth E.; Parvin, Bahram
2017-01-01
Automated profiling of nuclear architecture, in histology sections, can potentially help predict the clinical outcomes. However, the task is challenging as a result of nuclear pleomorphism and cellular states (e.g., cell fate, cell cycle), which are compounded by the batch effect (e.g., variations in fixation and staining). Present methods, for nuclear segmentation, are based on human-designed features that may not effectively capture intrinsic nuclear architecture. In this paper, we propose a novel approach, called sparsity constrained convolutional regression (SCCR), for nuclei segmentation. Specifically, given raw image patches and the corresponding annotated binary masks, our algorithm jointly learns a bank of convolutional filters and a sparse linear regressor, where the former is used for feature extraction, and the latter aims to produce a likelihood for each pixel being nuclear region or background. During classification, the pixel label is simply determined by a thresholding operation applied on the likelihood map. The method has been evaluated using the benchmark dataset collected from The Cancer Genome Atlas (TCGA). Experimental results demonstrate that our method outperforms traditional nuclei segmentation algorithms and is able to achieve competitive performance compared to the state-of-the-art algorithm built upon human-designed features with biological prior knowledge. PMID:28101301
A rotor optimization using regression analysis
NASA Technical Reports Server (NTRS)
Giansante, N.
1984-01-01
The design and development of helicopter rotors is subject to the many design variables and their interactions that effect rotor operation. Until recently, selection of rotor design variables to achieve specified rotor operational qualities has been a costly, time consuming, repetitive task. For the past several years, Kaman Aerospace Corporation has successfully applied multiple linear regression analysis, coupled with optimization and sensitivity procedures, in the analytical design of rotor systems. It is concluded that approximating equations can be developed rapidly for a multiplicity of objective and constraint functions and optimizations can be performed in a rapid and cost effective manner; the number and/or range of design variables can be increased by expanding the data base and developing approximating functions to reflect the expanded design space; the order of the approximating equations can be expanded easily to improve correlation between analyzer results and the approximating equations; gradients of the approximating equations can be calculated easily and these gradients are smooth functions reducing the risk of numerical problems in the optimization; the use of approximating functions allows the problem to be started easily and rapidly from various initial designs to enhance the probability of finding a global optimum; and the approximating equations are independent of the analysis or optimization codes used.
Genetics Home Reference: caudal regression syndrome
... of a genetic condition? Genetic and Rare Diseases Information Center Frequency Caudal regression syndrome is estimated to occur in 1 to ... parts of the skeleton, gastrointestinal system, and genitourinary ... caudal regression syndrome results from the presence of an abnormal ...