#### Sample records for age-adjusted linear regression

1. Multiple linear regression.

PubMed

Eberly, Lynn E

2007-01-01

This chapter describes multiple linear regression, a statistical approach used to describe the simultaneous associations of several variables with one continuous outcome. Important steps in using this approach include estimation and inference, variable selection in model building, and assessing model fit. The special cases of regression with interactions among the variables, polynomial regression, regressions with categorical (grouping) variables, and separate slopes models are also covered. Examples in microbiology are used throughout. PMID:18450050

2. Fast Censored Linear Regression

PubMed Central

HUANG, YIJIAN

2013-01-01

Weighted log-rank estimating function has become a standard estimation method for the censored linear regression model, or the accelerated failure time model. Well established statistically, the estimator defined as a consistent root has, however, rather poor computational properties because the estimating function is neither continuous nor, in general, monotone. We propose a computationally efficient estimator through an asymptotics-guided Newton algorithm, in which censored quantile regression methods are tailored to yield an initial consistent estimate and a consistent derivative estimate of the limiting estimating function. We also develop fast interval estimation with a new proposal for sandwich variance estimation. The proposed estimator is asymptotically equivalent to the consistent root estimator and barely distinguishable in samples of practical size. However, computation time is typically reduced by two to three orders of magnitude for point estimation alone. Illustrations with clinical applications are provided. PMID:24347802

3. Correlation and simple linear regression.

PubMed

Eberly, Lynn E

2007-01-01

This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression. PMID:18450049

4. Recursive Algorithm For Linear Regression

NASA Technical Reports Server (NTRS)

Varanasi, S. V.

1988-01-01

Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.

5. Multiple linear regression analysis

NASA Technical Reports Server (NTRS)

Edwards, T. R.

1980-01-01

Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

6. Practical Session: Simple Linear Regression

Clausel, M.; Grégoire, G.

2014-12-01

Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).

7. Linear regression in astronomy. II

NASA Technical Reports Server (NTRS)

Feigelson, Eric D.; Babu, Gutti J.

1992-01-01

A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

8. Linear regression in astronomy. I

NASA Technical Reports Server (NTRS)

Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

1990-01-01

Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.

9. Practical Session: Multiple Linear Regression

Clausel, M.; Grégoire, G.

2014-12-01

Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr33.pdf) and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_5.pdf).

10. LRGS: Linear Regression by Gibbs Sampling

2016-02-01

LRGS (Linear Regression by Gibbs Sampling) implements a Gibbs sampler to solve the problem of multivariate linear regression with uncertainties in all measured quantities and intrinsic scatter. LRGS extends an algorithm by Kelly (2007) that used Gibbs sampling for performing linear regression in fairly general cases in two ways: generalizing the procedure for multiple response variables, and modeling the prior distribution of covariates using a Dirichlet process.

11. Three-Dimensional Modeling in Linear Regression.

ERIC Educational Resources Information Center

Herman, James D.

Linear regression examines the relationship between one or more independent (predictor) variables and a dependent variable. By using a particular formula, regression determines the weights needed to minimize the error term for a given set of predictors. With one predictor variable, the relationship between the predictor and the dependent variable…

12. A Constrained Linear Estimator for Multiple Regression

ERIC Educational Resources Information Center

Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.

2010-01-01

"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…

13. Multiple Linear Regression: A Realistic Reflector.

ERIC Educational Resources Information Center

Nutt, A. T.; Batsell, R. R.

Examples of the use of Multiple Linear Regression (MLR) techniques are presented. This is done to show how MLR aids data processing and decision-making by providing the decision-maker with freedom in phrasing questions and by accurately reflecting the data on hand. A brief overview of the rationale underlying MLR is given, some basic definitions…

14. Moving the Bar: Transformations in Linear Regression.

ERIC Educational Resources Information Center

Miranda, Janet

The assumption that is most important to the hypothesis testing procedure of multiple linear regression is the assumption that the residuals are normally distributed, but this assumption is not always tenable given the realities of some data sets. When normal distribution of the residuals is not met, an alternative method can be initiated. As an…

15. A tutorial on Bayesian Normal linear regression

Klauenberg, Katy; Wübbeler, Gerd; Mickan, Bodo; Harris, Peter; Elster, Clemens

2015-12-01

Regression is a common task in metrology and often applied to calibrate instruments, evaluate inter-laboratory comparisons or determine fundamental constants, for example. Yet, a regression model cannot be uniquely formulated as a measurement function, and consequently the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements are not applicable directly. Bayesian inference, however, is well suited to regression tasks, and has the advantage of accounting for additional a priori information, which typically robustifies analyses. Furthermore, it is anticipated that future revisions of the GUM shall also embrace the Bayesian view. Guidance on Bayesian inference for regression tasks is largely lacking in metrology. For linear regression models with Gaussian measurement errors this tutorial gives explicit guidance. Divided into three steps, the tutorial first illustrates how a priori knowledge, which is available from previous experiments, can be translated into prior distributions from a specific class. These prior distributions have the advantage of yielding analytical, closed form results, thus avoiding the need to apply numerical methods such as Markov Chain Monte Carlo. Secondly, formulas for the posterior results are given, explained and illustrated, and software implementations are provided. In the third step, Bayesian tools are used to assess the assumptions behind the suggested approach. These three steps (prior elicitation, posterior calculation, and robustness to prior uncertainty and model adequacy) are critical to Bayesian inference. The general guidance given here for Normal linear regression tasks is accompanied by a simple, but real-world, metrological example. The calibration of a flow device serves as a running example and illustrates the three steps. It is shown that prior knowledge from previous calibrations of the same sonic nozzle enables robust predictions even for extrapolations.

16. Mental chronometry with simple linear regression.

PubMed

Chen, J Y

1997-10-01

Typically, mental chronometry is performed by means of introducing an independent variable postulated to affect selectively some stage of a presumed multistage process. However, the effect could be a global one that spreads proportionally over all stages of the process. Currently, there is no method to test this possibility although simple linear regression might serve the purpose. In the present study, the regression approach was tested with tasks (memory scanning and mental rotation) that involved a selective effect and with a task (word superiority effect) that involved a global effect, by the dominant theories. The results indicate (1) the manipulation of the size of a memory set or of angular disparity affects the intercept of the regression function that relates the times for memory scanning with different set sizes or for mental rotation with different angular disparities and (2) the manipulation of context affects the slope of the regression function that relates the times for detecting a target character under word and nonword conditions. These ratify the regression approach as a useful method for doing mental chronometry. PMID:9347535

17. Multiple linear regression for isotopic measurements

Garcia Alonso, J. I.

2012-04-01

There are two typical applications of isotopic measurements: the detection of natural variations in isotopic systems and the detection man-made variations using enriched isotopes as indicators. For both type of measurements accurate and precise isotope ratio measurements are required. For the so-called non-traditional stable isotopes, multicollector ICP-MS instruments are usually applied. In many cases, chemical separation procedures are required before accurate isotope measurements can be performed. The off-line separation of Rb and Sr or Nd and Sm is the classical procedure employed to eliminate isobaric interferences before multicollector ICP-MS measurement of Sr and Nd isotope ratios. Also, this procedure allows matrix separation for precise and accurate Sr and Nd isotope ratios to be obtained. In our laboratory we have evaluated the separation of Rb-Sr and Nd-Sm isobars by liquid chromatography and on-line multicollector ICP-MS detection. The combination of this chromatographic procedure with multiple linear regression of the raw chromatographic data resulted in Sr and Nd isotope ratios with precisions and accuracies typical of off-line sample preparation procedures. On the other hand, methods for the labelling of individual organisms (such as a given plant, fish or animal) are required for population studies. We have developed a dual isotope labelling procedure which can be unique for a given individual, can be inherited in living organisms and it is stable. The detection of the isotopic signature is based also on multiple linear regression. The labelling of fish and its detection in otoliths by Laser Ablation ICP-MS will be discussed using trout and salmon as examples. As a conclusion, isotope measurement procedures based on multiple linear regression can be a viable alternative in multicollector ICP-MS measurements.

18. Double linear regression classification for face recognition

Feng, Qingxiang; Zhu, Qi; Tang, Lin-Lin; Pan, Jeng-Shyang

2015-02-01

A new classifier designed based on linear regression classification (LRC) classifier and simple-fast representation-based classifier (SFR), named double linear regression classification (DLRC) classifier, is proposed for image recognition in this paper. As we all know, the traditional LRC classifier only uses the distance between test image vectors and predicted image vectors of the class subspace for classification. And the SFR classifier uses the test image vectors and the nearest image vectors of the class subspace to classify the test sample. However, the DLRC classifier computes out the predicted image vectors of each class subspace and uses all the predicted vectors to construct a novel robust global space. Then, the DLRC utilizes the novel global space to get the novel predicted vectors of each class for classification. A mass number of experiments on AR face database, JAFFE face database, Yale face database, Extended YaleB face database, and PIE face database are used to evaluate the performance of the proposed classifier. The experimental results show that the proposed classifier achieves better recognition rate than the LRC classifier, SFR classifier, and several other classifiers.

19. Sparse brain network using penalized linear regression

Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.

2011-03-01

Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.

20. A Gibbs sampler for multivariate linear regression

2016-04-01

Kelly described an efficient algorithm, using Gibbs sampling, for performing linear regression in the fairly general case where non-zero measurement errors exist for both the covariates and response variables, where these measurements may be correlated (for the same data point), where the response variable is affected by intrinsic scatter in addition to measurement error, and where the prior distribution of covariates is modelled by a flexible mixture of Gaussians rather than assumed to be uniform. Here, I extend the Kelly algorithm in two ways. First, the procedure is generalized to the case of multiple response variables. Secondly, I describe how to model the prior distribution of covariates using a Dirichlet process, which can be thought of as a Gaussian mixture where the number of mixture components is learned from the data. I present an example of multivariate regression using the extended algorithm, namely fitting scaling relations of the gas mass, temperature, and luminosity of dynamically relaxed galaxy clusters as a function of their mass and redshift. An implementation of the Gibbs sampler in the R language, called LRGS, is provided.

1. Fuzzy multiple linear regression: A computational approach

NASA Technical Reports Server (NTRS)

Juang, C. H.; Huang, X. H.; Fleming, J. W.

1992-01-01

This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

2. Augmenting Data with Published Results in Bayesian Linear Regression

ERIC Educational Resources Information Center

de Leeuw, Christiaan; Klugkist, Irene

2012-01-01

In most research, linear regression analyses are performed without taking into account published results (i.e., reported summary statistics) of similar previous studies. Although the prior density in Bayesian linear regression could accommodate such prior knowledge, formal models for doing so are absent from the literature. The goal of this…

3. Who Will Win?: Predicting the Presidential Election Using Linear Regression

ERIC Educational Resources Information Center

Lamb, John H.

2007-01-01

This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…

4. Compound Identification Using Penalized Linear Regression on Metabolomics

PubMed Central

Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho

2014-01-01

Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894

5. A VBA-based Simulation for Teaching Simple Linear Regression

ERIC Educational Resources Information Center

Jones, Gregory Todd; Hagtvedt, Reidar; Jones, Kari

2004-01-01

In spite of the name, simple linear regression presents a number of conceptual difficulties, particularly for introductory students. This article describes a simulation tool that provides a hands-on method for illuminating the relationship between parameters and sample statistics.

6. A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION

EPA Science Inventory

We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...

7. Linear regression analysis of survival data with missing censoring indicators.

PubMed

Wang, Qihua; Dinse, Gregg E

2011-04-01

Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial. PMID:20559722

8. Use of probabilistic weights to enhance linear regression myoelectric control

Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.

2015-12-01

Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

9. A Bayesian approach to linear regression in astronomy

Sereno, Mauro

2016-01-01

Linear regression is common in astronomical analyses. I discuss a Bayesian hierarchical modelling of data with heteroscedastic and possibly correlated measurement errors and intrinsic scatter. The method fully accounts for time evolution. The slope, the normalization, and the intrinsic scatter of the relation can evolve with the redshift. The intrinsic distribution of the independent variable is approximated using a mixture of Gaussian distributions whose means and standard deviations depend on time. The method can address scatter in the measured independent variable (a kind of Eddington bias), selection effects in the response variable (Malmquist bias), and departure from linearity in form of a knee. I tested the method with toy models and simulations and quantified the effect of biases and inefficient modelling. The R-package LIRA (LInear Regression in Astronomy) is made available to perform the regression.

10. Construction cost estimation of municipal incinerators by fuzzy linear regression

SciTech Connect

Chang, N.B.; Chen, Y.L.; Yang, H.H.

1996-12-31

Regression analysis has been widely used in engineering cost estimation. It is recognized that the fuzzy structure in cost estimation is a different type of uncertainty compared to the measurement error in the least-squares regression modeling. Hence, the uncertainties encountered in many events of construction and operating costs estimation and prediction cannot be fully depicted by conventional least-squares regression models. This paper presents a construction cost analysis of municipal incinerators by the techniques of fuzzy linear regression. A thorough investigation of construction costs in the Taiwan Resource Recovery Project was conducted based on design parameters such as design capacity, type of grate system, and the selected air pollution control process. The focus has been placed upon the methodology for dealing with the heterogeneity phenomenon of a set of observations for which regression is evaluated.

11. Direction of Effects in Multiple Linear Regression Models.

PubMed

Wiedermann, Wolfgang; von Eye, Alexander

2015-01-01

Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed. PMID:26609741

12. Multiple Linear Regression as a Technique for Predicting College Enrollment.

ERIC Educational Resources Information Center

Clegg, Ambrose A.; And Others

The application of multiple linear regression to the problem of identifying appropriate criterion variables and predicting enrollment in college courses during a period of major rapid decline was studied. Data were gathered on course enrollments for 1972-78 at Kent State University, and five independent variables were selected to determine the…

13. Rethinking the linear regression model for spatial ecological data.

PubMed

Wagner, Helene H

2013-11-01

The linear regression model, with its numerous extensions including multivariate ordination, is fundamental to quantitative research in many disciplines. However, spatial or temporal structure in the data may invalidate the regression assumption of independent residuals. Spatial structure at any spatial scale can be modeled flexibly based on a set of uncorrelated component patterns (e.g., Moran's eigenvector maps, MEM) that is derived from the spatial relationships between sampling locations as defined in a spatial weight matrix. Spatial filtering thus addresses spatial autocorrelation in the residuals by adding such component patterns (spatial eigenvectors) as predictors to the regression model. However, space is not an ecologically meaningful predictor, and commonly used tests for selecting significant component patterns do not take into account the specific nature of these variables. This paper proposes "spatial component regression" (SCR) as a new way of integrating the linear regression model with Moran's eigenvector maps. In its unconditioned form, SCR decomposes the relationship between response and predictors by component patterns, whereas conditioned SCR provides an alternative method of spatial filtering, taking into account the statistical properties of component patterns in the design of statistical hypothesis tests. Application to the well-known multivariate mite data set illustrates how SCR may be used to condition for significant residual spatial structure and to identify additional predictors associated with residual spatial structure. Finally, I argue that all variance is spatially structured, hence spatial independence is best characterized by a lack of excess variance at any spatial scale, i.e., spatial white noise. PMID:24400490

14. A linear regression solution to the spatial autocorrelation problem

Griffith, Daniel A.

The Moran Coefficient spatial autocorrelation index can be decomposed into orthogonal map pattern components. This decomposition relates it directly to standard linear regression, in which corresponding eigenvectors can be used as predictors. This paper reports comparative results between these linear regressions and their auto-Gaussian counterparts for the following georeferenced data sets: Columbus (Ohio) crime, Ottawa-Hull median family income, Toronto population density, southwest Ohio unemployment, Syracuse pediatric lead poisoning, and Glasgow standard mortality rates, and a small remotely sensed image of the High Peak district. This methodology is extended to auto-logistic and auto-Poisson situations, with selected data analyses including percentage of urban population across Puerto Rico, and the frequency of SIDs cases across North Carolina. These data analytic results suggest that this approach to georeferenced data analysis offers considerable promise.

15. HIGH RESOLUTION FOURIER ANALYSIS WITH AUTO-REGRESSIVE LINEAR PREDICTION

SciTech Connect

Barton, J.; Shirley, D.A.

1984-04-01

Auto-regressive linear prediction is adapted to double the resolution of Angle-Resolved Photoemission Extended Fine Structure (ARPEFS) Fourier transforms. Even with the optimal taper (weighting function), the commonly used taper-and-transform Fourier method has limited resolution: it assumes the signal is zero beyond the limits of the measurement. By seeking the Fourier spectrum of an infinite extent oscillation consistent with the measurements but otherwise having maximum entropy, the errors caused by finite data range can be reduced. Our procedure developed to implement this concept adapts auto-regressive linear prediction to extrapolate the signal in an effective and controllable manner. Difficulties encountered when processing actual ARPEFS data are discussed. A key feature of this approach is the ability to convert improved measurements (signal-to-noise or point density) into improved Fourier resolution.

16. Comparison of Logistic Regression and Linear Regression in Modeling Percentage Data

PubMed Central

Zhao, Lihui; Chen, Yuhuan; Schaffner, Donald W.

2001-01-01

Percentage is widely used to describe different results in food microbiology, e.g., probability of microbial growth, percent inactivated, and percent of positive samples. Four sets of percentage data, percent-growth-positive, germination extent, probability for one cell to grow, and maximum fraction of positive tubes, were obtained from our own experiments and the literature. These data were modeled using linear and logistic regression. Five methods were used to compare the goodness of fit of the two models: percentage of predictions closer to observations, range of the differences (predicted value minus observed value), deviation of the model, linear regression between the observed and predicted values, and bias and accuracy factors. Logistic regression was a better predictor of at least 78% of the observations in all four data sets. In all cases, the deviation of logistic models was much smaller. The linear correlation between observations and logistic predictions was always stronger. Validation (accomplished using part of one data set) also demonstrated that the logistic model was more accurate in predicting new data points. Bias and accuracy factors were found to be less informative when evaluating models developed for percentage data, since neither of these indices can compare predictions at zero. Model simplification for the logistic model was demonstrated with one data set. The simplified model was as powerful in making predictions as the full linear model, and it also gave clearer insight in determining the key experimental factors. PMID:11319091

17. Conditional local influence in case-weights linear regression.

PubMed

Poon, W Y; Poon, Y S

2001-05-01

The local influence approach proposed by Cook (1986) makes use of the normal curvature and the direction achieving the maximum curvature to assess the local influence of minor perturbation of statistical models. When the approach is applied to the linear regression model, the result provides information concerning the data structure different from that contributed by Cook's distance. One of the main advantages of the local influence approach is its ability to handle the simultaneous effect of several cases, namely, the ability to address the problem of 'masking'. However, Lawrance (1995) points out that there are two notions of 'masking' effects, the joint influence and the conditional influence, which are distinct in nature. The normal curvature and the direction of maximum curvature are capable of addressing effects under the category of joint influences but not conditional influences. We construct a new measure to define and detect conditional local influences and use the linear regression model for illustration. Several reported data sets are used to demonstrate that new information can be revealed by this proposed measure. PMID:11393899

18. The extinction law from photometric data: linear regression methods

Ascenso, J.; Lombardi, M.; Lada, C. J.; Alves, J.

2012-04-01

Context. The properties of dust grains, in particular their size distribution, are expected to differ from the interstellar medium to the high-density regions within molecular clouds. Since the extinction at near-infrared wavelengths is caused by dust, the extinction law in cores should depart from that found in low-density environments if the dust grains have different properties. Aims: We explore methods to measure the near-infrared extinction law produced by dense material in molecular cloud cores from photometric data. Methods: Using controlled sets of synthetic and semi-synthetic data, we test several methods for linear regression applied to the specific problem of deriving the extinction law from photometric data. We cover the parameter space appropriate to this type of observations. Results: We find that many of the common linear-regression methods produce biased results when applied to the extinction law from photometric colors. We propose and validate a new method, LinES, as the most reliable for this effect. We explore the use of this method to detect whether or not the extinction law of a given reddened population has a break at some value of extinction. Based on observations collected at the European Organisation for Astronomical Research in the Southern Hemisphere, Chile (ESO programmes 069.C-0426 and 074.C-0728).

19. Precipitation interpolation in mountainous regions using multiple linear regression

USGS Publications Warehouse

Hay, L.; Viger, R.; McCabe, G.

1998-01-01

Multiple linear regression (MLR) was used to spatially interpolate precipitation for simulating runoff in the Animas River basin of southwestern Colorado. MLR equations were defined for each time step using measured precipitation as dependent variables. Explanatory variables used in each MLR were derived for the dependent variable locations from a digital elevation model (DEM) using a geographic information system. The same explanatory variables were defined for a 5 ?? 5 km grid of the DEM. For each time step, the best MLR equation was chosen and used to interpolate precipitation onto the 5 ?? 5 km grid. The gridded values of precipitation provide a physically-based estimate of the spatial distribution of precipitation and result in reliable simulations of daily runoff in the Animas River basin.

20. The Dantzig Selector for Censored Linear Regression Models

PubMed Central

Li, Yi; Dicker, Lee; Zhao, Sihai Dave

2013-01-01

The Dantzig variable selector has recently emerged as a powerful tool for fitting regularized regression models. To our knowledge, most work involving the Dantzig selector has been performed with fully-observed response variables. This paper proposes a new class of adaptive Dantzig variable selectors for linear regression models when the response variable is subject to right censoring. This is motivated by a clinical study to identify genes predictive of event-free survival in newly diagnosed multiple myeloma patients. Under some mild conditions, we establish the theoretical properties of our procedures, including consistency in model selection (i.e. the right subset model will be identified with a probability tending to 1) and the optimal efficiency of estimation (i.e. the asymptotic distribution of the estimates is the same as that when the true subset model is known a priori). The practical utility of the proposed adaptive Dantzig selectors is verified via extensive simulations. We apply our new methods to the aforementioned myeloma clinical trial and identify important predictive genes. PMID:24478569

1. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

PubMed Central

Almedeij, Jaber

2012-01-01

Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984

2. The Allometry of Coarse Root Biomass: Log-Transformed Linear Regression or Nonlinear Regression?

PubMed Central

Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J.; Ma, Keping

2013-01-01

Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees. PMID:24116197

3. Outlier Detection In Linear Regression Using Standart Parity Space Approach

Mustafa Durdag, Utkan; Hekimoglu, Serif

2013-04-01

Despite all technological advancements, outliers may occur due to some mistakes in engineering measurements. Before estimation of unknown parameters, aforementioned outliers must be detected and removed from the measurements. There are two main outlier detection methods: the conventional tests based on least square approach (e.g. Baarda, Pope etc.) and the robust tests (e.g. Huber, Hampel etc.) are used to identify outliers in a set of measurement. Standart Parity Space Approach is one of the important model-based Fault Detection and Isolation (FDI) technique that usually uses in Control Engineering. In this study the standart parity space method is used for outlier detection in linear regression. Our main goal is to compare success of two approaches of standart parity space method and conventional tests in linear regression through the Monte Carlo simulation with each other. The least square estimation is the most common estimator as known and it minimizes the sum of squared residuals. In standart parity space approach to eliminate unknown vector, the measurement vector projected onto the left null space of the coefficient matrix. Thus, the orthogonal condition of parity vector is satisfied and only the effects of noise vector noticed. The residual vector is derived from two cases that one is absence of an outlier; the other is occurrence of an outlier. Its likelihood function is used for determining the detection decision function for global Test. Localization decision function is calculated for each column of parity matrix and the maximum one of these values is accepted as an outlier. There are some results obtained from two different intervals that one of them is between 3σ and 6σ (small outlier) the other one is between 6σ and 12σ (large outlier) for outlier generator when the number of unknown parameter is chosen 2 and 3. The measure success rates (MSR) of Baarda's method is better than the standart parity space method when the confidence intervals are

4. Forecasting Groundwater Temperature with Linear Regression Models Using Historical Data.

PubMed

Figura, Simon; Livingstone, David M; Kipfer, Rolf

2015-01-01

Although temperature is an important determinant of many biogeochemical processes in groundwater, very few studies have attempted to forecast the response of groundwater temperature to future climate warming. Using a composite linear regression model based on the lagged relationship between historical groundwater and regional air temperature data, empirical forecasts were made of groundwater temperature in several aquifers in Switzerland up to the end of the current century. The model was fed with regional air temperature projections calculated for greenhouse-gas emissions scenarios A2, A1B, and RCP3PD. Model evaluation revealed that the approach taken is adequate only when the data used to calibrate the models are sufficiently long and contain sufficient variability. These conditions were satisfied for three aquifers, all fed by riverbank infiltration. The forecasts suggest that with respect to the reference period 1980 to 2009, groundwater temperature in these aquifers will most likely increase by 1.1 to 3.8 K by the end of the current century, depending on the greenhouse-gas emissions scenario employed. PMID:25412761

5. Robust linear regression with broad distributions of errors

Postnikov, Eugene B.; Sokolov, Igor M.

2015-09-01

We consider the problem of linear fitting of noisy data in the case of broad (say α-stable) distributions of random impacts ("noise"), which can lack even the first moment. This situation, common in statistical physics of small systems, in Earth sciences, in network science or in econophysics, does not allow for application of conventional Gaussian maximum-likelihood estimators resulting in usual least-squares fits. Such fits lead to large deviations of fitted parameters from their true values due to the presence of outliers. The approaches discussed here aim onto the minimization of the width of the distribution of residua. The corresponding width of the distribution can either be defined via the interquantile distance of the corresponding distributions or via the scale parameter in its characteristic function. The methods provide the robust regression even in the case of short samples with large outliers, and are equivalent to the normal least squares fit for the Gaussian noises. Our discussion is illustrated by numerical examples.

6. Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment

ERIC Educational Resources Information Center

Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos

2013-01-01

In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…

7. Interpreting Multiple Linear Regression: A Guidebook of Variable Importance

ERIC Educational Resources Information Center

Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim

2012-01-01

Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…

8. Sample Sizes when Using Multiple Linear Regression for Prediction

ERIC Educational Resources Information Center

Knofczynski, Gregory T.; Mundfrom, Daniel

2008-01-01

When using multiple regression for prediction purposes, the issue of minimum required sample size often needs to be addressed. Using a Monte Carlo simulation, models with varying numbers of independent variables were examined and minimum sample sizes were determined for multiple scenarios at each number of independent variables. The scenarios…

9. Least-Squares Linear Regression and Schrodinger's Cat: Perspectives on the Analysis of Regression Residuals.

ERIC Educational Resources Information Center

Hecht, Jeffrey B.

The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…

10. Evaluation of preservative systems in a sunscreen formula by linear regression method.

PubMed

Bou-Chacra, Nádia A; Pinto, Terezinha de Jesus A; Ohara, Mitsuko Taba

2003-01-01

A sunscreen formula with eight different preservative systems was evaluated by linear regression, pharmacopeial, and the CTFA (Cosmetic, Toiletry and Fragrance Association) methods. The preparations were tested against Staphylococcus aureus, Burkholderia cepacia, Shewanella putrefaciens, Escherichia coli, and Bacillus sp. The linear regression method proved to be useful in the selection of the most effective preservative system used in cosmetic formulation. PMID:12688287

11. The application of Dynamic Linear Bayesian Models in hydrological forecasting: Varying Coefficient Regression and Discount Weighted Regression

Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa

2015-11-01

A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.

12. Divergent estimation error in portfolio optimization and in linear regression

Kondor, I.; Varga-Haszonits, I.

2008-08-01

The problem of estimation error in portfolio optimization is discussed, in the limit where the portfolio size N and the sample size T go to infinity such that their ratio is fixed. The estimation error strongly depends on the ratio N/T and diverges for a critical value of this parameter. This divergence is the manifestation of an algorithmic phase transition, it is accompanied by a number of critical phenomena, and displays universality. As the structure of a large number of multidimensional regression and modelling problems is very similar to portfolio optimization, the scope of the above observations extends far beyond finance, and covers a large number of problems in operations research, machine learning, bioinformatics, medical science, economics, and technology.

13. Two biased estimation techniques in linear regression: Application to aircraft

NASA Technical Reports Server (NTRS)

1988-01-01

Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.

14. SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

PubMed Central

Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

2014-01-01

The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844

15. Identifying predictors of physics item difficulty: A linear regression approach

Mesic, Vanes; Muratovic, Hasnija

2011-06-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge

16. Simultaneous Determination of Cobalt, Copper, and Nickel by Multivariate Linear Regression.

ERIC Educational Resources Information Center

1990-01-01

Presented is an experiment where the concentrations of three metal ions in a solution are simultaneously determined by ultraviolet-vis spectroscopy. Availability of the computer program used for statistically analyzing data using a multivariate linear regression is listed. (KR)

17. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

EPA Science Inventory

As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

18. Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

PubMed Central

Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

2012-01-01

In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882

19. Graphical Description of Johnson-Neyman Outcomes for Linear and Quadratic Regression Surfaces.

ERIC Educational Resources Information Center

Schafer, William D.; Wang, Yuh-Yin

A modification of the usual graphical representation of heterogeneous regressions is described that can aid in interpreting significant regions for linear or quadratic surfaces. The standard Johnson-Neyman graph is a bivariate plot with the criterion variable on the ordinate and the predictor variable on the abscissa. Regression surfaces are drawn…

20. Transmission of linear regression patterns between time series: From relationship in time series to complex networks

Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui

2014-07-01

The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.

1. a Comparison of Linear Regression Methods Forthe Detection of Apple Internal Quality by Nearinfrared Spectroscopy

Zhu, Dazhou; Ji, Baoping; Meng, Chaoying; Shi, Bolin; Tu, Zhenhua; Qing, Zhaoshen

Hybrid linear analysis (HLA), partial least-squares (PLS) regression, and the linear least square support vector machine (LSSVM) were used to determinate the soluble solids content (SSC) of apple by Fourier transform near-infrared (FT-NIR) spectroscopy. The performance of these three linear regression methods was compared. Results showed that HLA could be used for the analysis of complex solid samples such as apple. The predictive ability of SSC model constructed by HLA was comparable to that of PLS. HLA was sensitive to outliers, thus the outliers should be eliminated before HLA calibration. Linear LSSVM performed better than PLS and HLA. Direct orthogonal signal correction (DOSC) pretreatment was effective for PLS and linear LSSVM, but not suitable for HLA. The combination of DOSC and linear LSSVM had good generalization ability and was not sensitive to outliers, so it is a promising method for linear multivariate calibration.

2. A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design

PubMed Central

2014-01-01

Background In biomedical research, response variables are often encountered which have bounded support on the open unit interval - (0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. Methods In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. Results If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar

3. An improved multiple linear regression and data analysis computer program package

NASA Technical Reports Server (NTRS)

Sidik, S. M.

1972-01-01

NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

4. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

ERIC Educational Resources Information Center

Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

2006-01-01

Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

5. Application of wavelet-based multiple linear regression model to rainfall forecasting in Australia

He, X.; Guan, H.; Zhang, X.; Simmons, C.

2013-12-01

In this study, a wavelet-based multiple linear regression model is applied to forecast monthly rainfall in Australia by using monthly historical rainfall data and climate indices as inputs. The wavelet-based model is constructed by incorporating the multi-resolution analysis (MRA) with the discrete wavelet transform and multiple linear regression (MLR) model. The standardized monthly rainfall anomaly and large-scale climate index time series are decomposed using MRA into a certain number of component subseries at different temporal scales. The hierarchical lag relationship between the rainfall anomaly and each potential predictor is identified by cross correlation analysis with a lag time of at least one month at different temporal scales. The components of predictor variables with known lag times are then screened with a stepwise linear regression algorithm to be selectively included into the final forecast model. The MRA-based rainfall forecasting method is examined with 255 stations over Australia, and compared to the traditional multiple linear regression model based on the original time series. The models are trained with data from the 1959-1995 period and then tested in the 1996-2008 period for each station. The performance is compared with observed rainfall values, and evaluated by common statistics of relative absolute error and correlation coefficient. The results show that the wavelet-based regression model provides considerably more accurate monthly rainfall forecasts for all of the selected stations over Australia than the traditional regression model.

6. Comparison of Linear and Non-Linear Regression Models to Estimate Leaf Area Index of Dryland Shrubs.

Dashti, H.; Glenn, N. F.; Ilangakoon, N. T.; Mitchell, J.; Dhakal, S.; Spaete, L.

2015-12-01

Leaf area index (LAI) is a key parameter in global ecosystem studies. LAI is considered a forcing variable in land surface processing models since ecosystem dynamics are highly correlated to LAI. In response to environmental limitations, plants in semiarid ecosystems have smaller leaf area, making accurate estimation of LAI by remote sensing a challenging issue. Optical remote sensing (400-2500 nm) techniques to estimate LAI are based either on radiative transfer models (RTMs) or statistical approaches. Considering the complex radiation field of dry ecosystems, simple 1-D RTMs lead to poor results, and on the other hand, inversion of more complex 3-D RTMs is a demanding task which requires the specification of many variables. A good alternative to physical approaches is using methods based on statistics. Similar to many natural phenomena, there is a non-linear relationship between LAI and top of canopy electromagnetic waves reflected to optical sensors. Non-linear regression models can better capture this relationship. However, considering the problem of a few numbers of observations in comparison to the feature space (nlinear models. In this study linear versus non-linear regression techniques were investigated to estimate LAI. Our study area is located in southwestern Idaho, Great Basin. Sagebrush (Artemisia tridentata spp) serves a critical role in maintaining the structure of this ecosystem. Using a leaf area meter (Accupar LP-80), LAI values were measured in the field. Linear Partial Least Square regression and non-linear, tree based Random Forest regression have been implemented to estimate the LAI of sagebrush from hyperspectral data (AVIRIS-ng) collected in late summer 2014. Cross validation of results indicate that PLS can provide comparable results to Random Forest.

7. Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.

PubMed

Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C

2014-03-01

In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices. PMID:24608685

8. Solution of the linear regression problem using matrix correction methods in the l 1 metric

Gorelik, V. A.; Trembacheva (Barkalova), O. S.

2016-02-01

The linear regression problem is considered as an improper interpolation problem. The metric l 1 is used to correct (approximate) all the initial data. A probabilistic justification of this metric in the case of the exponential noise distribution is given. The original improper interpolation problem is reduced to a set of a finite number of linear programming problems. The corresponding computational algorithms are implemented in MATLAB.

9. Prediction models for CO2 emission in Malaysia using best subsets regression and multi-linear regression

Tan, C. H.; Matjafri, M. Z.; Lim, H. S.

2015-10-01

This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.

10. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

PubMed

2014-06-01

Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. PMID:25079842

11. Using Simple Linear Regression to Assess the Success of the Montreal Protocol in Reducing Atmospheric Chlorofluorocarbons

ERIC Educational Resources Information Center

Nelson, Dean

2009-01-01

Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…

12. A Comparison of Robust and Nonparametric Estimators under the Simple Linear Regression Model.

ERIC Educational Resources Information Center

Nevitt, Jonathan; Tam, Hak P.

This study investigates parameter estimation under the simple linear regression model for situations in which the underlying assumptions of ordinary least squares estimation are untenable. Classical nonparametric estimation methods are directly compared against some robust estimation methods for conditions in which varying degrees of outliers are…

13. Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

ERIC Educational Resources Information Center

Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

2013-01-01

Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…

14. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

ERIC Educational Resources Information Center

Cooper, Paul D.

2010-01-01

A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

15. Point Estimates and Confidence Intervals for Variable Importance in Multiple Linear Regression

ERIC Educational Resources Information Center

Thomas, D. Roland; Zhu, PengCheng; Decady, Yves J.

2007-01-01

The topic of variable importance in linear regression is reviewed, and a measure first justified theoretically by Pratt (1987) is examined in detail. Asymptotic variance estimates are used to construct individual and simultaneous confidence intervals for these importance measures. A simulation study of their coverage properties is reported, and an…

16. Calibrated Peer Review for Interpreting Linear Regression Parameters: Results from a Graduate Course

ERIC Educational Resources Information Center

Enders, Felicity B.; Jenkins, Sarah; Hoverman, Verna

2010-01-01

Biostatistics is traditionally a difficult subject for students to learn. While the mathematical aspects are challenging, it can also be demanding for students to learn the exact language to use to correctly interpret statistical results. In particular, correctly interpreting the parameters from linear regression is both a vital tool and a…

17. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

EPA Science Inventory

Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

18. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

PubMed

Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

2016-03-01

In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. PMID:26774211

19. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

NASA Technical Reports Server (NTRS)

MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

2005-01-01

Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

20. Age-adjusted Labor Force Participation Rates, 1960-2045.

ERIC Educational Resources Information Center

Szafran, Robert F.

2002-01-01

A proposed new age-adjusted measure for calculating labor force participation rate eliminates the effect of changes in the age distribution. According to the new criterion, increases in women's labor force participation from 1960-2000 would have been even greater of shifts in the age distribution had not occurred. (Contains 12 references.) (JOW)

1. Use of multivariate linear regression and support vector regression to predict functional outcome after surgery for cervical spondylotic myelopathy.

PubMed

Hoffman, Haydn; Lee, Sunghoon I; Garst, Jordan H; Lu, Derek S; Li, Charles H; Nagasawa, Daniel T; Ghalehsari, Nima; Jahanforouz, Nima; Razaghy, Mehrdad; Espinal, Marie; Ghavamrezaii, Amir; Paak, Brian H; Wu, Irene; Sarrafzadeh, Majid; Lu, Daniel C

2015-09-01

This study introduces the use of multivariate linear regression (MLR) and support vector regression (SVR) models to predict postoperative outcomes in a cohort of patients who underwent surgery for cervical spondylotic myelopathy (CSM). Currently, predicting outcomes after surgery for CSM remains a challenge. We recruited patients who had a diagnosis of CSM and required decompressive surgery with or without fusion. Fine motor function was tested preoperatively and postoperatively with a handgrip-based tracking device that has been previously validated, yielding mean absolute accuracy (MAA) results for two tracking tasks (sinusoidal and step). All patients completed Oswestry disability index (ODI) and modified Japanese Orthopaedic Association questionnaires preoperatively and postoperatively. Preoperative data was utilized in MLR and SVR models to predict postoperative ODI. Predictions were compared to the actual ODI scores with the coefficient of determination (R(2)) and mean absolute difference (MAD). From this, 20 patients met the inclusion criteria and completed follow-up at least 3 months after surgery. With the MLR model, a combination of the preoperative ODI score, preoperative MAA (step function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.452; MAD=0.0887; p=1.17 × 10(-3)). With the SVR model, a combination of preoperative ODI score, preoperative MAA (sinusoidal function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.932; MAD=0.0283; p=5.73 × 10(-12)). The SVR model was more accurate than the MLR model. The SVR can be used preoperatively in risk/benefit analysis and the decision to operate. PMID:26115898

2. Use of multivariate linear regression and support vector regression to predict functional outcome after surgery for cervical spondylotic myelopathy

PubMed Central

Hoffman, Haydn; Lee, Sunghoon Ivan; Garst, Jordan H.; Lu, Derek S.; Li, Charles H.; Nagasawa, Daniel T.; Ghalehsari, Nima; Jahanforouz, Nima; Razaghy, Mehrdad; Espinal, Marie; Ghavamrezaii, Amir; Paak, Brian H.; Wu, Irene; Sarrafzadeh, Majid; Lu, Daniel C.

2016-01-01

This study introduces the use of multivariate linear regression (MLR) and support vector regression (SVR) models to predict postoperative outcomes in a cohort of patients who underwent surgery for cervical spondylotic myelopathy (CSM). Currently, predicting outcomes after surgery for CSM remains a challenge. We recruited patients who had a diagnosis of CSM and required decompressive surgery with or without fusion. Fine motor function was tested preoperatively and postoperatively with a handgrip-based tracking device that has been previously validated, yielding mean absolute accuracy (MAA) results for two tracking tasks (sinusoidal and step). All patients completed Oswestry disability index (ODI) and modified Japanese Orthopaedic Association questionnaires preoperatively and postoperatively. Preoperative data was utilized in MLR and SVR models to predict postoperative ODI. Predictions were compared to the actual ODI scores with the coefficient of determination (R2) and mean absolute difference (MAD). From this, 20 patients met the inclusion criteria and completed follow-up at least 3 months after surgery. With the MLR model, a combination of the preoperative ODI score, preoperative MAA (step function), and symptom duration yielded the best prediction of postoperative ODI (R2 = 0.452; MAD = 0.0887; p = 1.17 × 10−3). With the SVR model, a combination of preoperative ODI score, preoperative MAA (sinusoidal function), and symptom duration yielded the best prediction of postoperative ODI (R2 = 0.932; MAD = 0.0283; p = 5.73 × 10−12). The SVR model was more accurate than the MLR model. The SVR can be used preoperatively in risk/benefit analysis and the decision to operate. PMID:26115898

3. CANFIS: A non-linear regression procedure to produce statistical air-quality forecast models

SciTech Connect

Burrows, W.R.; Montpetit, J.; Pudykiewicz, J.

1997-12-31

Statistical models for forecasts of environmental variables can provide a good trade-off between significance and precision in return for substantial saving of computer execution time. Recent non-linear regression techniques give significantly increased accuracy compared to traditional linear regression methods. Two are Classification and Regression Trees (CART) and the Neuro-Fuzzy Inference System (NFIS). Both can model predict and distributions, including the tails, with much better accuracy than linear regression. Given a learning data set of matched predict and predictors, CART regression produces a non-linear, tree-based, piecewise-continuous model of the predict and data. Its variance-minimizing procedure optimizes the task of predictor selection, often greatly reducing initial data dimensionality. NFIS reduces dimensionality by a procedure known as subtractive clustering but it does not of itself eliminate predictors. Over-lapping coverage in predictor space is enhanced by NFIS with a Gaussian membership function for each cluster component. Coefficients for a continuous response model based on the fuzzified cluster centers are obtained by a least-squares estimation procedure. CANFIS is a two-stage data-modeling technique that combines the strength of CART to optimize the process of selecting predictors from a large pool of potential predictors with the modeling strength of NFIS. A CANFIS model requires negligible computer time to run. CANFIS models for ground-level O{sub 3}, particulates, and other pollutants will be produced for each of about 100 Canadian sites. The air-quality models will run twice daily using a small number of predictors isolated from a large pool of upstream and local Lagrangian potential predictors.

4. About the multiple linear regressions applied in studying the solvatochromic effects.

PubMed

Dorohoi, Dana-Ortansa

2010-03-01

Statistical analysis is applied to study the solvatochromic effects using the solvent parameters (regressors) influencing the spectral shifts in the electronic spectra. The data pointed to eliminate the non-significant parameters and the aberrant points (for which supplemental interactions were neglected in used theories) from those supposed to multi-linear regression. A BASIC program permits to follow these desiderates step by step. In order to exemplify the steps of regression, the wavenumbers of the maximum pi-pi* absorption band of three benzene derivatives in various solvents were used. PMID:20089443

5. The stratospheric response to external factors based on MERRA data using linear multivariate linear regression analysis

Kozubek, M.; Rozanov, E.; Krizan, P.

2014-09-01

The stratosphere is influenced by many external forcings (natural or anthropogenic). There are many studies which are focused on this problem and that is why we can compare our results with them. This study is focused on the variability and trends of temperature and circulation characteristics (zonal and meridional wind component) in connection with different phenomena variation in the stratosphere and lower mesosphere. We consider the interactions between the troposphere-stratosphere-lower mesosphere system and external and internal phenomena, e.g. solar cycle, QBO, NAO or ENSO using multiple linear techniques. The analysis was applied to the period 1979-2012 based on the current reanalysis data, mainly the MERRA reanalysis dataset (Modern Era Retrospective-analysis for Research and Applications) for pressure levels: 1000-0.1 hPa. We do not find a strong temperature signal for solar flux over the tropics about 30 hPa (ERA-40 results) but the strong positive signal has been observed near stratopause almost in the whole analyzed area. This could indicate that solar forcing is not represented well in the higher pressure levels in MERRA. The analysis of ENSO and ENSO Modoki shows that we should take into account more than one ENSO index for similar analysis. Previous studies show that the volcanic activity is important parameter. The signal of volcanic activity in MERRA is very weak and insignificant.

6. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models.

PubMed

Liu, Dawei; Lin, Xihong; Ghosh, Debashis

2007-12-01

We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations. PMID:18078480

7. Linear regression techniques for use in the EC tracer method of secondary organic aerosol estimation

Saylor, Rick D.; Edgerton, Eric S.; Hartsell, Benjamin E.

A variety of linear regression techniques and simple slope estimators are evaluated for use in the elemental carbon (EC) tracer method of secondary organic carbon (OC) estimation. Linear regression techniques based on ordinary least squares are not suitable for situations where measurement uncertainties exist in both regressed variables. In the past, regression based on the method of Deming [1943. Statistical Adjustment of Data. Wiley, London] has been the preferred choice for EC tracer method parameter estimation. In agreement with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], we find that in the limited case where primary non-combustion OC (OC non-comb) is assumed to be zero, the ratio of averages (ROA) approach provides a stable and reliable estimate of the primary OC-EC ratio, (OC/EC) pri. In contrast with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], however, we find that the optimal use of Deming regression (and the more general York et al. [2004. Unified equations for the slope, intercept, and standard errors of the best straight line. American Journal of Physics 72, 367-375] regression) provides excellent results as well. For the more typical case where OC non-comb is allowed to obtain a non-zero value, we find that regression based on the method of York is the preferred choice for EC tracer method parameter estimation. In the York regression technique, detailed information on uncertainties in the measurement of OC and EC is used to improve the linear best fit to the given data. If only limited information is available on the relative uncertainties of OC and EC, then Deming regression should be used. On the other hand, use of ROA in the estimation of secondary OC, and thus the assumption of a zero OC non-comb value, generally leads to an overestimation of the contribution of secondary OC to total measured OC.

8. Island embryo regression driven by a beam of self-ions in the linear regime

Flynn, C. P.

2010-10-01

The kinetics of island growth and regression are discussed under the approximation of linear response, including the Gibbs-Thompson potential, for a reacting assembly of adatoms and advacancies (thermal defects) on a surface irradiated with a beam of self-ions. First the quasistatic growth or shrinkage rate, for islands of size n less than the critical size \\hat {n} , is calculated for the driven system, exactly, for linear response. This result is employed to determine successively: (i) the regression rate of driven embryo islands with n \\lt \\hat {n} ; and (ii) the structure of the steady state decay chain established when embryos of a particular size n_{0}\\lt \\hat {n} are created by ion beam impacts. The changed embryo distribution caused by irradiation differs markedly from the populations of the embryos at equilibrium.

9. User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)

USGS Publications Warehouse

Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.

2009-01-01

Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.

10. SERF: A Simple, Effective, Robust, and Fast Image Super-Resolver From Cascaded Linear Regression.

PubMed

Hu, Yanting; Wang, Nannan; Tao, Dacheng; Gao, Xinbo; Li, Xuelong

2016-09-01

Example learning-based image super-resolution techniques estimate a high-resolution image from a low-resolution input image by relying on high- and low-resolution image pairs. An important issue for these techniques is how to model the relationship between high- and low-resolution image patches: most existing complex models either generalize hard to diverse natural images or require a lot of time for model training, while simple models have limited representation capability. In this paper, we propose a simple, effective, robust, and fast (SERF) image super-resolver for image super-resolution. The proposed super-resolver is based on a series of linear least squares functions, namely, cascaded linear regression. It has few parameters to control the model and is thus able to robustly adapt to different image data sets and experimental settings. The linear least square functions lead to closed form solutions and therefore achieve computationally efficient implementations. To effectively decrease these gaps, we group image patches into clusters via k-means algorithm and learn a linear regressor for each cluster at each iteration. The cascaded learning process gradually decreases the gap of high-frequency detail between the estimated high-resolution image patch and the ground truth image patch and simultaneously obtains the linear regression parameters. Experimental results show that the proposed method achieves superior performance with lower time consumption than the state-of-the-art methods. PMID:27323364

11. Speaker adaptation of HMMs using evolutionary strategy-based linear regression

Selouani, Sid-Ahmed; O'Shaughnessy, Douglas

2002-05-01

A new framework for speaker adaptation of continuous-density hidden Markov models (HMMs) is introduced. It aims to improve the robustness of speech recognizers by adapting HMM parameters to new conditions (e.g., from new speakers). It describes an optimization technique using an evolutionary strategy for linear regression-based spectral transformation. In classical iterative maximum likelihood linear regression (MLLR), a global transform matrix is estimated to make a general model better match particular target conditions. To permit adaptation on a small amount of data, a regression tree classification is performed. However, an important drawback of MLLR is that the number of regression classes is fixed. The new approach allows the degree of freedom of the global transform to be implicitly variable, as the evolutionary optimization permits the survival of only active classes. The fitness function is evaluated by the phoneme correctness through the evolution steps. The implementation requirements such as chromosome representation, selection function, genetic operators, and evaluation function have been chosen in order to lend more reliability to the global transformation matrix. Triphone experiments used the TIMIT and ARPA-RM1 databases. For new speakers, the new technique achieves 8 percent fewer word errors than the basic MLLR method.

12. Anthropometric influence on physical fitness among preschool children: gender-specific linear and curvilinear regression models.

PubMed

Kondric, Miran; Trajkovski, Biljana; Strbad, Maja; Foretić, Nikola; Zenić, Natasa

2013-12-01

There is evident lack of studies which investigated morphological influence on physical fitness (PF) among preschool children. The aim of this study was to (1) calculate and interpret linear and nonlinear relationships between simple anthropometric predictors and PF criteria among preschoolers of both genders, and (2) to find critical values of the anthropometric predictors which should be recognized as the breakpoint of the negative influence on the PF. The sample of subjects consisted of 413 preschoolers aged 4 to 6 (mean age, 5.08 years; 176 girls and 237 boys), from Rijeka, Croatia. The anthropometric variables included body height (BH), body weight (BW), sum of triceps and subscapular skinfold (SUMSF), and calculated BMI (BMI = BW (kg)/BH (m)2). The PF was screened throughout testing of flexibility, repetitive strength, explosive strength, and agility. Linear and nonlinear (general quadratic model y = a + bx + cx2) regressions were calculated and interpreted simultaneously. BH and BW are far better predictors of the physical fitness status than BMI and SUMSF. In all calculated regressions excluding flexibility criterion, linear and nonlinear prediction of the PF throughout BH and BW reached statistical significance, indicating influence of the advancement in maturity status on PF variables Differences between linear and nonlinear regressions are smaller in males than in females. There are some indices that the age of 4 to 6 years is a critical period in the prevention of obesity, mostly because the extensively studied and proven negative influence of overweight and adiposity on PF tests is not yet evident. In some cases we have found evident regression breakpoints (approximately 25 kg in boys), which should be interpreted as critical values of the anthropometric measures for the studied sample of subjects. PMID:24611341

13. INAKT--an interactive non-linear regression program for enzyme inactivation and affinity labelling studies.

PubMed

Christophersen, A; McKinley-McKee, J S

1984-01-01

An interactive program for analysing enzyme activity-time data using non-linear regression analysis is described. Protection studies can also be dealt with. The program computes inactivation rates, dissociation constants and promotion or inhibition parameters with their standard errors. It can also be used to distinguish different inactivation models. The program is written in SIMULA and is menu-oriented for refining or correcting data at the different levels of computing. PMID:6546558

14. An hourly PM10 diagnosis model for the Bilbao metropolitan area using a linear regression methodology.

PubMed

González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O

2013-07-01

There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods. PMID:23247520

15. Comparison of various texture classification methods using multiresolution analysis and linear regression modelling.

PubMed

Dhanya, S; Kumari Roshni, V S

2016-01-01

Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform. PMID:26835234

16. Hypothesis testing in functional linear regression models with Neyman's truncation and wavelet thresholding for longitudinal data.

PubMed

Yang, Xiaowei; Nie, Kun

2008-03-15

Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data. PMID:17610294

17. Quantitative prediction of integrase inhibitor resistance from genotype through consensus linear regression modeling

PubMed Central

2013-01-01

Background Integrase inhibitors (INI) form a new drug class in the treatment of HIV-1 patients. We developed a linear regression modeling approach to make a quantitative raltegravir (RAL) resistance phenotype prediction, as Fold Change in IC50 against a wild type virus, from mutations in the integrase genotype. Methods We developed a clonal genotype-phenotype database with 991 clones from 153 clinical isolates of INI naïve and RAL treated patients, and 28 site-directed mutants. We did the development of the RAL linear regression model in two stages, employing a genetic algorithm (GA) to select integrase mutations by consensus. First, we ran multiple GAs to generate first order linear regression models (GA models) that were stochastically optimized to reach a goal R2 accuracy, and consisted of a fixed-length subset of integrase mutations to estimate INI resistance. Secondly, we derived a consensus linear regression model in a forward stepwise regression procedure, considering integrase mutations or mutation pairs by descending prevalence in the GA models. Results The most frequently occurring mutations in the GA models were 92Q, 97A, 143R and 155H (all 100%), 143G (90%), 148H/R (89%), 148K (88%), 151I (81%), 121Y (75%), 143C (72%), and 74M (69%). The RAL second order model contained 30 single mutations and five mutation pairs (p < 0.01): 143C/R&97A, 155H&97A/151I and 74M&151I. The R2 performance of this model on the clonal training data was 0.97, and 0.78 on an unseen population genotype-phenotype dataset of 171 clinical isolates from RAL treated and INI naïve patients. Conclusions We describe a systematic approach to derive a model for predicting INI resistance from a limited amount of clonal samples. Our RAL second order model is made available as an Additional file for calculating a resistance phenotype as the sum of integrase mutations and mutation pairs. PMID:23282253

18. Distributed Monitoring of the R(sup 2) Statistic for Linear Regression

NASA Technical Reports Server (NTRS)

Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.

2011-01-01

The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.

19. Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models

PubMed Central

Elliott, Michael R.

2012-01-01

In sample surveys where units have unequal probabilities of inclusion, associations between the inclusion probability and the statistic of interest can induce bias in unweighted estimates. This is true even in regression models, where the estimates of the population slope may be biased if the underlying mean model is misspecified or the sampling is nonignorable. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have highly variable weights; weight trimming reduces large weights to a maximum value, reducing variability but introducing bias. Most standard approaches are ad hoc in that they do not use the data to optimize bias-variance trade-offs. This article uses Bayesian model averaging to create “data driven” weight trimming estimators. We extend previous results for linear regression models (Elliott 2008) to generalized linear regression models, developing robust models that approximate fully-weighted estimators when bias correction is of greatest importance, and approximate unweighted estimators when variance reduction is critical. PMID:23275683

20. Multiple regression technique for Pth degree polynominals with and without linear cross products

NASA Technical Reports Server (NTRS)

Davis, J. W.

1973-01-01

A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

1. Aboveground biomass and carbon stocks modelling using non-linear regression model

Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd

2016-06-01

Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.

2. Inference of dense spectral reflectance images from sparse reflectance measurement using non-linear regression modeling

Deglint, Jason; Kazemzadeh, Farnoud; Wong, Alexander; Clausi, David A.

2015-09-01

One method to acquire multispectral images is to sequentially capture a series of images where each image contains information from a different bandwidth of light. Another method is to use a series of beamsplitters and dichroic filters to guide different bandwidths of light onto different cameras. However, these methods are very time consuming and expensive and perform poorly in dynamic scenes or when observing transient phenomena. An alternative strategy to capturing multispectral data is to infer this data using sparse spectral reflectance measurements captured using an imaging device with overlapping bandpass filters, such as a consumer digital camera using a Bayer filter pattern. Currently the only method of inferring dense reflectance spectra is the Wiener adaptive filter, which makes Gaussian assumptions about the data. However, these assumptions may not always hold true for all data. We propose a new technique to infer dense reflectance spectra from sparse spectral measurements through the use of a non-linear regression model. The non-linear regression model used in this technique is the random forest model, which is an ensemble of decision trees and trained via the spectral characterization of the optical imaging system and spectral data pair generation. This model is then evaluated by spectrally characterizing different patches on the Macbeth color chart, as well as by reconstructing inferred multispectral images. Results show that the proposed technique can produce inferred dense reflectance spectra that correlate well with the true dense reflectance spectra, which illustrates the merits of the technique.

3. Application of dynamic linear regression to improve the skill of ensemble-based deterministic ozone forecasts

SciTech Connect

Pagowski, M O; Grell, G A; Devenyi, D; Peckham, S E; McKeen, S A; Gong, W; Monache, L D; McHenry, J N; McQueen, J; Lee, P

2006-02-02

Forecasts from seven air quality models and surface ozone data collected over the eastern USA and southern Canada during July and August 2004 provide a unique opportunity to assess benefits of ensemble-based ozone forecasting and devise methods to improve ozone forecasts. In this investigation, past forecasts from the ensemble of models and hourly surface ozone measurements at over 350 sites are used to issue deterministic 24-h forecasts using a method based on dynamic linear regression. Forecasts of hourly ozone concentrations as well as maximum daily 8-h and 1-h averaged concentrations are considered. It is shown that the forecasts issued with the application of this method have reduced bias and root mean square error and better overall performance scores than any of the ensemble members and the ensemble average. Performance of the method is similar to another method based on linear regression described previously by Pagowski et al., but unlike the latter, the current method does not require measurements from multiple monitors since it operates on individual time series. Improvement in the forecasts can be easily implemented and requires minimal computational cost.

4. Estimating leaf photosynthetic pigments information by stepwise multiple linear regression analysis and a leaf optical model

Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei

2014-10-01

Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.

5. Causal correlation of foliar biochemical concentrations with AVIRIS spectra using forced entry linear regression

NASA Technical Reports Server (NTRS)

Dawson, Terence P.; Curran, Paul J.; Kupiec, John A.

1995-01-01

link between wavelengths chosen by stepwise regression and the biochemical of interest, and this in turn has cast doubts on the use of imaging spectrometry for the estimation of foliar biochemical concentrations at sites distant from the training sites. To investigate this problem, an analysis was conducted on the variation in canopy biochemical concentrations and reflectance spectra using forced entry linear regression.

6. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

ERIC Educational Resources Information Center

Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

2013-01-01

This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

7. A Simulation-Based Comparison of Several Stochastic Linear Regression Methods in the Presence of Outliers.

ERIC Educational Resources Information Center

Rule, David L.

Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…

8. Chicken barn climate and hazardous volatile compounds control using simple linear regression and PID

Abdullah, A. H.; Bakar, M. A. A.; Shukor, S. A. A.; Saad, F. S. A.; Kamis, M. S.; Mustafa, M. H.; Khalid, N. S.

2016-07-01

The hazardous volatile compounds from chicken manure in chicken barn are potentially to be a health threat to the farm animals and workers. Ammonia (NH3) and hydrogen sulphide (H2S) produced in chicken barn are influenced by climate changes. The Electronic Nose (e-nose) is used for the barn's air, temperature and humidity data sampling. Simple Linear Regression is used to identify the correlation between temperature-humidity, humidity-ammonia and ammonia-hydrogen sulphide. MATLAB Simulink software was used for the sample data analysis using PID controller. Results shows that the performance of PID controller using the Ziegler-Nichols technique can improve the system controller to control climate in chicken barn.

9. Discriminative Feature Extraction via Multivariate Linear Regression for SSVEP-Based BCI.

PubMed

Wang, Haiqiang; Zhang, Yu; Waytowich, Nicholas R; Krusienski, Dean J; Zhou, Guoxu; Jin, Jing; Wang, Xingyu; Cichocki, Andrzej

2016-05-01

Many of the most widely accepted methods for reliable detection of steady-state visual evoked potentials (SSVEPs) in the electroencephalogram (EEG) utilize canonical correlation analysis (CCA). CCA uses pure sine and cosine reference templates with frequencies corresponding to the visual stimulation frequencies. These generic reference templates may not optimally reflect the natural SSVEP features obscured by the background EEG. This paper introduces a new approach that utilizes spatio-temporal feature extraction with multivariate linear regression (MLR) to learn discriminative SSVEP features for improving the detection accuracy. MLR is implemented on dimensionality-reduced EEG training data and a constructed label matrix to find optimally discriminative subspaces. Experimental results show that the proposed MLR method significantly outperforms CCA as well as several other competing methods for SSVEP detection, especially for time windows shorter than 1 second. This demonstrates that the MLR method is a promising new approach for achieving improved real-time performance of SSVEP-BCIs. PMID:26812728

10. Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis.

PubMed

Rubio, Francisco J; Genton, Marc G

2016-06-30

We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26856806

11. The overlooked potential of Generalized Linear Models in astronomy, I: Binomial regression

de Souza, R. S.; Cameron, E.; Killedar, M.; Hilbe, J.; Vilalta, R.; Maio, U.; Biffi, V.; Ciardi, B.; Riggs, J. D.

2015-09-01

Revealing hidden patterns in astronomical data is often the path to fundamental scientific breakthroughs; meanwhile the complexity of scientific enquiry increases as more subtle relationships are sought. Contemporary data analysis problems often elude the capabilities of classical statistical techniques, suggesting the use of cutting edge statistical methods. In this light, astronomers have overlooked a whole family of statistical techniques for exploratory data analysis and robust regression, the so-called Generalized Linear Models (GLMs). In this paper-the first in a series aimed at illustrating the power of these methods in astronomical applications-we elucidate the potential of a particular class of GLMs for handling binary/binomial data, the so-called logit and probit regression techniques, from both a maximum likelihood and a Bayesian perspective. As a case in point, we present the use of these GLMs to explore the conditions of star formation activity and metal enrichment in primordial minihaloes from cosmological hydro-simulations including detailed chemistry, gas physics, and stellar feedback. We predict that for a dark mini-halo with metallicity ≈ 1.3 × 10-4Z⨀, an increase of 1.2 × 10-2 in the gas molecular fraction, increases the probability of star formation occurrence by a factor of 75%. Finally, we highlight the use of receiver operating characteristic curves as a diagnostic for binary classifiers, and ultimately we use these to demonstrate the competitive predictive performance of GLMs against the popular technique of artificial neural networks.

12. Evaluation of multivariate linear regression and artificial neural networks in prediction of water quality parameters

PubMed Central

2014-01-01

This paper examined the efficiency of multivariate linear regression (MLR) and artificial neural network (ANN) models in prediction of two major water quality parameters in a wastewater treatment plant. Biochemical oxygen demand (BOD) and chemical oxygen demand (COD) as well as indirect indicators of organic matters are representative parameters for sewer water quality. Performance of the ANN models was evaluated using coefficient of correlation (r), root mean square error (RMSE) and bias values. The computed values of BOD and COD by model, ANN method and regression analysis were in close agreement with their respective measured values. Results showed that the ANN performance model was better than the MLR model. Comparative indices of the optimized ANN with input values of temperature (T), pH, total suspended solid (TSS) and total suspended (TS) for prediction of BOD was RMSE = 25.1 mg/L, r = 0.83 and for prediction of COD was RMSE = 49.4 mg/L, r = 0.81. It was found that the ANN model could be employed successfully in estimating the BOD and COD in the inlet of wastewater biochemical treatment plants. Moreover, sensitive examination results showed that pH parameter have more effect on BOD and COD predicting to another parameters. Also, both implemented models have predicted BOD better than COD. PMID:24456676

13. Modeling the Philippines' real gross domestic product: A normal estimation equation for multiple linear regression

Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.

2016-02-01

The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.

14. Predicting students' success at pre-university studies using linear and logistic regressions

Suliman, Noor Azizah; Abidin, Basir; Manan, Norhafizah Abdul; Razali, Ahmad Mahir

2014-09-01

The study is aimed to find the most suitable model that could predict the students' success at the medical pre-university studies, Centre for Foundation in Science, Languages and General Studies of Cyberjaya University College of Medical Sciences (CUCMS). The predictors under investigation were the national high school exit examination-Sijil Pelajaran Malaysia (SPM) achievements such as Biology, Chemistry, Physics, Additional Mathematics, Mathematics, English and Bahasa Malaysia results as well as gender and high school background factors. The outcomes showed that there is a significant difference in the final CGPA, Biology and Mathematics subjects at pre-university by gender factor, while by high school background also for Mathematics subject. In general, the correlation between the academic achievements at the high school and medical pre-university is moderately significant at α-level of 0.05, except for languages subjects. It was found also that logistic regression techniques gave better prediction models than the multiple linear regression technique for this data set. The developed logistic models were able to give the probability that is almost accurate with the real case. Hence, it could be used to identify successful students who are qualified to enter the CUCMS medical faculty before accepting any students to its foundation program.

15. Comparison of K-Means Clustering with Linear Probability Model, Linear Discriminant Function, and Logistic Regression for Predicting Two-Group Membership.

ERIC Educational Resources Information Center

So, Tak-Shing Harry; Peng, Chao-Ying Joanne

This study compared the accuracy of predicting two-group membership obtained from K-means clustering with those derived from linear probability modeling, linear discriminant function, and logistic regression under various data properties. Multivariate normally distributed populations were simulated based on combinations of population proportions,…

16. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

PubMed

Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

2016-04-01

Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. PMID:26720396

17. Spatial disaggregation of carbon dioxide emissions from road traffic based on multiple linear regression model

Shu, Yuqin; Lam, Nina S. N.

2011-01-01

Detailed estimates of carbon dioxide emissions at fine spatial scales are critical to both modelers and decision makers dealing with global warming and climate change. Globally, traffic-related emissions of carbon dioxide are growing rapidly. This paper presents a new method based on a multiple linear regression model to disaggregate traffic-related CO 2 emission estimates from the parish-level scale to a 1 × 1 km grid scale. Considering the allocation factors (population density, urban area, income, road density) together, we used a correlation and regression analysis to determine the relationship between these factors and traffic-related CO 2 emissions, and developed the best-fit model. The method was applied to downscale the traffic-related CO 2 emission values by parish (i.e. county) for the State of Louisiana into 1-km 2 grid cells. In the four highest parishes in traffic-related CO 2 emissions, the biggest area that has above average CO 2 emissions is found in East Baton Rouge, and the smallest area with no CO 2 emissions is also in East Baton Rouge, but Orleans has the most CO 2 emissions per unit area. The result reveals that high CO 2 emissions are concentrated in dense road network of urban areas with high population density and low CO 2 emissions are distributed in rural areas with low population density, sparse road network. The proposed method can be used to identify the emission "hot spots" at fine scale and is considered more accurate and less time-consuming than the previous methods.

18. A non linear multiple regression approach for inferring the probability distribution of hydrological model errors

Montanari, A.

2006-12-01

This contribution introduces a statistically based approach for uncertainty assessment in hydrological modeling, in an optimality context. Indeed, in several real world applications, there is the need for the user to select a model that is deemed to be the best possible choice accordingly to a given goodness of fit criteria. In this case, it is extremely important to assess the model uncertainty, intended as the range around the model output within which the measured hydrological variable is expected to fall with a given probability. This indication allows the user to quantify the risk associated to a decision that is based on the model response. The technique proposed here is carried out by inferring the probability distribution of the hydrological model error through a non linear multiple regression approach, depending on an arbitrary number of selected conditioning variables. These may include the current and previous model output as well as internal state variables of the model. The purpose is to indirectly relate the model error to the sources of uncertainty, through the conditioning variables. The method can be applied to any model of arbitrary complexity, included distributed approaches. The probability distribution of the model error is derived in the Gaussian space, through a meta-Gaussian approach. The normal quantile transform is applied in order to make the marginal probability distribution of the model error and the conditioning variables Gaussian. Then the above marginal probability distributions are related through the multivariate Gaussian distribution, whose parameters are estimated via multiple regression. Application of the inverse of the normal quantile transform allows the user to derive the confidence limits of the model output for an assigned significance level. The proposed technique is valid under statistical assumptions, that are essentially those conditioning the validity of the multiple regression in the Gaussian space. Statistical tests

19. Prediction of Depression in Cancer Patients With Different Classification Criteria, Linear Discriminant Analysis versus Logistic Regression

PubMed Central

Shayan, Zahra; Mezerji, Naser Mohammad Gholi; Shayan, Leila; Naseri, Parisa

2016-01-01

Background: Logistic regression (LR) and linear discriminant analysis (LDA) are two popular statistical models for prediction of group membership. Although they are very similar, the LDA makes more assumptions about the data. When categorical and continuous variables used simultaneously, the optimal choice between the two models is questionable. In most studies, classification error (CE) is used to discriminate between subjects in several groups, but this index is not suitable to predict the accuracy of the outcome. The present study compared LR and LDA models using classification indices. Methods: This cross-sectional study selected 243 cancer patients. Sample sets of different sizes (n = 50, 100, 150, 200, 220) were randomly selected and the CE, B, and Q classification indices were calculated by the LR and LDA models. Results: CE revealed the a lack of superiority for one model over the other, but the results showed that LR performed better than LDA for the B and Q indices in all situations. No significant effect for sample size on CE was noted for selection of an optimal model. Assessment of the accuracy of prediction of real data indicated that the B and Q indices are appropriate for selection of an optimal model. Conclusion: The results of this study showed that LR performs better in some cases and LDA in others when based on CE. The CE index is not appropriate for classification, although the B and Q indices performed better and offered more efficient criteria for comparison and discrimination between groups.

20. Accounting for data errors discovered from an audit in multiple linear regression.

PubMed

Shepherd, Bryan E; Yu, Chang

2011-09-01

A data coordinating team performed onsite audits and discovered discrepancies between the data sent to the coordinating center and that recorded at sites. We present statistical methods for incorporating audit results into analyses. This can be thought of as a measurement error problem, where the distribution of errors is a mixture with a point mass at 0. If the error rate is nonzero, then even if the mean of the discrepancy between the reported and correct values of a predictor is 0, naive estimates of the association between two continuous variables will be biased. We consider scenarios where there are (1) errors in the predictor, (2) errors in the outcome, and (3) possibly correlated errors in the predictor and outcome. We show how to incorporate the error rate and magnitude, estimated from a random subset (the audited records), to compute unbiased estimates of association and proper confidence intervals. We then extend these results to multiple linear regression where multiple covariates may be incorrect in the database and the rate and magnitude of the errors may depend on study site. We study the finite sample properties of our estimators using simulations, discuss some practical considerations, and illustrate our methods with data from 2815 HIV-infected patients in Latin America, of whom 234 had their data audited using a sequential auditing plan. PMID:21281274

1. Linear regression model for predicting interactive mixture toxicity of pesticide and ionic liquid.

PubMed

Qin, Li-Tang; Wu, Jie; Mo, Ling-Yun; Zeng, Hong-Hu; Liang, Yan-Peng

2015-08-01

The nature of most environmental contaminants comes from chemical mixtures rather than from individual chemicals. Most of the existed mixture models are only valid for non-interactive mixture toxicity. Therefore, we built two simple linear regression-based concentration addition (LCA) and independent action (LIA) models that aim to predict the combined toxicities of the interactive mixture. The LCA model was built between the negative log-transformation of experimental and expected effect concentrations of concentration addition (CA), while the LIA model was developed between the negative log-transformation of experimental and expected effect concentrations of independent action (IA). Twenty-four mixtures of pesticide and ionic liquid were used to evaluate the predictive abilities of LCA and LIA models. The models correlated well with the observed responses of the 24 binary mixtures. The values of the coefficient of determination (R (2)) and leave-one-out (LOO) cross-validated correlation coefficient (Q(2)) for LCA and LIA models are larger than 0.99, which indicates high predictive powers of the models. The results showed that the developed LCA and LIA models allow for accurately predicting the mixture toxicities of synergism, additive effect, and antagonism. The proposed LCA and LIA models may serve as a useful tool in ecotoxicological assessment. PMID:25929456

2. Fuzzy clustering and soft switching of linear regression models for reversible image compression

Aiazzi, Bruno; Alba, Pasquale S.; Alparone, Luciano; Baronti, Stefano

1998-10-01

This paper describes an original application of fuzzy logic to reversible compression of 2D and 3D data. The compression method consists of a space-variant prediction followed by context- based classification ad arithmetic coding of the outcome residuals. Prediction of a pixel to be encoded is obtained from the fuzzy-switching of a set of linear regression predictors. The coefficients of each predictor are calculated so as to minimize prediction MSE for those pixels whose graylevel patterns, lying on a causal neighborhood of prefixed shape, are vectors belonging in a fuzzy sense to one cluster. In the 3D case, pixels both on the current slice and on previously encoded slices may be used. The size and shape of the causal neighborhood, as well as the number of predictors to be switched, may be chosen before running the algorithm and determine the trade-off between coding performance sand computational cost. The method exhibits impressive performances, for both 2D and 3D data, mainly thanks to the optimality of predictors, due to their skill in fitting data patterns.

3. A linear merging methodology for high-resolution precipitation products using spatiotemporal regression

SciTech Connect

Turlapaty, Anish C.; Younan, Nicolas H.; Anantharaj, Valentine G

2012-01-01

Currently, the only viable option for a global precipitation product is the merger of several precipitation products from different modalities. In this article, we develop a linear merging methodology based on spatiotemporal regression. Four highresolution precipitation products (HRPPs), obtained through methods including the Climate Prediction Center's Morphing (CMORPH), Geostationary Operational Environmental Satellite-Based Auto-Estimator (GOES-AE), GOES-Based Hydro-Estimator (GOES-HE) and Self-Calibrating Multivariate Precipitation Retrieval (SCAMPR) algorithms, are used in this study. The merged data are evaluated against the Arkansas Red Basin River Forecast Center's (ABRFC's) ground-based rainfall product. The evaluation is performed using the Heidke skill score (HSS) for four seasons, from summer 2007 to spring 2008, and for two different rainfall detection thresholds. It is shown that the merged data outperform all the other products in seven out of eight cases. A key innovation of this machine learning method is that only 6% of the validation data are used for the initial training. The sensitivity of the algorithm to location, distribution of training data, selection of input data sets and seasons is also analysed and presented.

4. Robust linear regression model of Ki-67 for mitotic rate in gastrointestinal stromal tumors

PubMed Central

KEMMERLING, RALF; WEYLAND, DENIS; KIESSLICH, TOBIAS; ILLIG, ROMANA; KLIESER, ECKHARD; JÄGER, TARKAN; DIETZE, OTTO; NEUREITER, DANIEL

2014-01-01

Risk stratification of gastrointestinal stromal tumors (GISTs) by tumor size, lymph node and metastasis status is crucially affected by mitotic activity. To date, no studies have quantitatively compared mitotic activity in hematoxylin and eosin (H&E)-stained tissue sections with immunohistochemical markers, such as phosphohistone H3 (PHH3) and Ki-67. According to the TNM guidelines, the mitotic count on H&E sections and immunohistochemical PHH3-stained slides has been assessed per 50 high-power fields of 154 specimens of clinically documented GIST cases. The Ki-67-associated proliferation rate was evaluated on three digitalized hot spots using image analysis. The H&E-based mitotic rate was found to correlate significantly better with Ki-67-assessed proliferation activity than with PHH3-assessed proliferation activity (r=0.780; P<0.01). A linear regression model (analysis of variance; P<0.001) allowed reliable predictions of the H&E-associated mitoses based on the Ki-67 expression alone. Additionally, the Ki-67-associated proliferation revealed a higher and significant impact on the recurrence and metastasis rate of the GIST cases than by the classical H&E-based mitotic rate. The results of the present study indicated that the mitotic rate may be reliably and time-efficiently estimated by immunohistochemistry of Ki-67 using only three hot spots. PMID:24527082

5. The Ω Counter, a Frequency Counter Based on the Linear Regression.

PubMed

Rubiola, Enrico; Lenczner, Michel; Bourgeois, Pierre-Yves; Vernotte, Francois

2016-07-01

This paper introduces the Ω counter, a frequency counter-i.e., a frequency-to-digital converter-based on the linear regression (LR) algorithm on time stamps. We discuss the noise of the electronics. We derive the statistical properties of the Ω counter on rigorous mathematical basis, including the weighted measure and the frequency response. We describe an implementation based on a system on chip, under test in our laboratory, and we compare the Ω counter to the traditional Π and Λ counters. The LR exhibits the optimum rejection of white phase noise, superior to that of the Π and Λ counters. White noise is the major practical problem of wideband digital electronics, both in the instrument internal circuits and in the fast processes, which we may want to measure. With a measurement time τ , the variance is proportional to 1/τ(2) for the Π counter, and to 1/τ(3) for both the Λ and Ω counters. However, the Ω counter has the smallest possible variance, 1.25 dB smaller than that of the Λ counter. The Ω counter finds a natural application in the measurement of the parabolic variance, described in the companion article in this Journal [vol. 63 no. 4 pp. 611-623, April 2016 (Special Issue on the 50th Anniversary of the Allan Variance), DOI 10.1109/TUFFC.2015.2499325]. PMID:27244731

6. Comparison of linear discriminant analysis and logistic regression for data classification

Liong, Choong-Yeun; Foo, Sin-Fan

2013-04-01

Linear discriminant analysis (LDA) and logistic regression (LR) are often used for the purpose of classifying populations or groups using a set of predictor variables. Assumptions of multivariate normality and equal variance-covariance matrices across groups are required before proceeding with LDA, but such assumptions are not required for LR and hence LR is considered to be much more robust than LDA. In this paper, several real datasets which are different in terms of normality, number of independent variables and sample size are used to study the performance of both methods. The methods are compared based on the percentage of correct classification and B index. The results show that overall, LR performs better regardless of the distribution of the data is normal or nonnormal. However, LR needs longer computing time than LDA with the increase in sample size. The performance of LDA was also tested by using various prior probabilities. The results show that the average percentage of correct classification and the B index are higher when the prior probability is set based on the group size rather than using equal probabilities for all groups.

7. Electricity Consumption in the Industrial Sector of Jordan: Application of Multivariate Linear Regression and Adaptive Neuro-Fuzzy Techniques

Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.

2009-08-01

In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.

8. A non-linear regression method for CT brain perfusion analysis

Bennink, E.; Oosterbroek, J.; Viergever, M. A.; Velthuis, B. K.; de Jong, H. W. A. M.

2015-03-01

CT perfusion (CTP) imaging allows for rapid diagnosis of ischemic stroke. Generation of perfusion maps from CTP data usually involves deconvolution algorithms providing estimates for the impulse response function in the tissue. We propose the use of a fast non-linear regression (NLR) method that we postulate has similar performance to the current academic state-of-art method (bSVD), but that has some important advantages, including the estimation of vascular permeability, improved robustness to tracer-delay, and very few tuning parameters, that are all important in stroke assessment. The aim of this study is to evaluate the fast NLR method against bSVD and a commercial clinical state-of-art method. The three methods were tested against a published digital perfusion phantom earlier used to illustrate the superiority of bSVD. In addition, the NLR and clinical methods were also tested against bSVD on 20 clinical scans. Pearson correlation coefficients were calculated for each of the tested methods. All three methods showed high correlation coefficients (>0.9) with the ground truth in the phantom. With respect to the clinical scans, the NLR perfusion maps showed higher correlation with bSVD than the perfusion maps from the clinical method. Furthermore, the perfusion maps showed that the fast NLR estimates are robust to tracer-delay. In conclusion, the proposed fast NLR method provides a simple and flexible way of estimating perfusion parameters from CT perfusion scans, with high correlation coefficients. This suggests that it could be a better alternative to the current clinical and academic state-of-art methods.

9. Bayesian Method for Support Union Recovery in Multivariate Multi-Response Linear Regression

Chen, Wan-Ping

Sparse modeling has become a particularly important and quickly developing topic in many applications of statistics, machine learning, and signal processing. The main objective of sparse modeling is discovering a small number of predictive patterns that would improve our understanding of the data. This paper extends the idea of sparse modeling to the variable selection problem in high dimensional linear regression, where there are multiple response vectors, and they share the same or similar subsets of predictor variables to be selected from a large set of candidate variables. In the literature, this problem is called multi-task learning, support union recovery or simultaneous sparse coding in different contexts. We present a Bayesian method for solving this problem by introducing two nested sets of binary indicator variables. In the first set of indicator variables, each indicator is associated with a predictor variable or a regressor, indicating whether this variable is active for any of the response vectors. In the second set of indicator variables, each indicator is associated with both a predicator variable and a response vector, indicating whether this variable is active for the particular response vector. The problem of variable selection is solved by sampling from the posterior distributions of the two sets of indicator variables. We develop a Gibbs sampling algorithm for posterior sampling and use the generated samples to identify active support both in shared and individual level. Theoretical and simulation justification are performed in the paper. The proposed algorithm is also demonstrated on the real image data sets. To learn the patterns of the object in images, we treat images as the different tasks. Through combining images with the object in the same category, we cannot only learn the shared patterns efficiently but also get individual sketch of each image.

10. Optimization of end-members used in multiple linear regression geochemical mixing models

Dunlea, Ann G.; Murray, Richard W.

2015-11-01

Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).

11. Identifying keystone species in the human gut microbiome from metagenomic timeseries using sparse linear regression.

PubMed

Fisher, Charles K; Mehta, Pankaj

2014-01-01

Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called "errors-in-variables". Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct "keystone species", Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut

12. Using EMI, Geospatial Statistics and Multi-Linear Regression for Identifying Areas of Manure Accumulation on Feedlot Surfaces

Technology Transfer Automated Retrieval System (TEKTRAN)

Accumulated feedlot manure negatively affects the environment. The objective was to test the validity of using EMI mapping methods combined with predictive-based sampling and ordinary linear regression for measuring spatially variable manure accumulation. A Dualem-1S EMI meter also recording GPS c...

13. A comparative study between non-linear regression and non-parametric approaches for modelling Phalaris paradoxa seedling emergence

Technology Transfer Automated Retrieval System (TEKTRAN)

Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...

14. A New Test of Linear Hypotheses in OLS Regression under Heteroscedasticity of Unknown Form

ERIC Educational Resources Information Center

Cai, Li; Hayes, Andrew F.

2008-01-01

When the errors in an ordinary least squares (OLS) regression model are heteroscedastic, hypothesis tests involving the regression coefficients can have Type I error rates that are far from the nominal significance level. Asymptotically, this problem can be rectified with the use of a heteroscedasticity-consistent covariance matrix (HCCM)…

15. Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression

ERIC Educational Resources Information Center

2012-01-01

The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…

16. Confidence Intervals for an Effect Size Measure in Multiple Linear Regression

ERIC Educational Resources Information Center

Algina, James; Keselman, H. J.; Penfield, Randall D.

2007-01-01

The increase in the squared multiple correlation coefficient ([Delta]R[squared]) associated with a variable in a regression equation is a commonly used measure of importance in regression analysis. The coverage probability that an asymptotic and percentile bootstrap confidence interval includes [Delta][rho][squared] was investigated. As expected,…

17. An Investigation of the Median-Median Method of Linear Regression

ERIC Educational Resources Information Center

Walters, Elizabeth J.; Morrell, Christopher H.; Auer, Richard E.

2006-01-01

Least squares regression is the most common method of fitting a straight line to a set of bivariate data. Another less known method that is available on Texas Instruments graphing calculators is median-median regression. This method is proposed as a simple method that may be used with middle and high school students to motivate the idea of fitting…

18. Fixed-Width Confidence Intervals in Linear Regression with Applications to the Johnson-Neyman Technique.

ERIC Educational Resources Information Center

Aitkin, Murray A.

Fixed-width confidence intervals for a population regression line over a finite interval of x have recently been derived by Gafarian. The method is extended to provide fixed-width confidence intervals for the difference between two population regression lines, resulting in a simple procedure analogous to the Johnson-Neyman technique. (Author)

19. Weighted Structural Regression: A Broad Class of Adaptive Methods for Improving Linear Prediction.

ERIC Educational Resources Information Center

Pruzek, Robert M.; Lepak, Greg M.

1992-01-01

Adaptive forms of weighted structural regression are developed and discussed. Bootstrapping studies indicate that the new methods have potential to recover known population regression weights and predict criterion score values routinely better than do ordinary least squares methods. The new methods are scale free and simple to compute. (SLD)

20. Insights into antioxidant activity of 1-adamantylthiopyridine analogs using multiple linear regression.

PubMed

Worachartcheewan, Apilak; Nantasenamat, Chanin; Owasirikul, Wiwat; Monnor, Teerawat; Naruepantawart, Orapan; Janyapaisarn, Sayamon; Prachayasittikul, Supaluk; Prachayasittikul, Virapong

2014-02-12

A data set of 1-adamantylthiopyridine analogs (1-19) with antioxidant activity, comprising of 2,2-diphenyl-1-picrylhydrazyl (DPPH) and superoxide dismutase (SOD) activities, was used for constructing quantitative structure-activity relationship (QSAR) models. Molecular structures were geometrically optimized at B3LYP/6-31g(d) level and subjected for further molecular descriptor calculation using Dragon software. Multiple linear regression (MLR) was employed for the development of QSAR models using 3 significant descriptors (i.e. Mor29e, F04[N-N] and GATS5v) for predicting the DPPH activity and 2 essential descriptors (i.e. EEig06r and Mor06v) for predicting the SOD activity. Such molecular descriptors accounted for the effects and positions of substituent groups (R) on the 1-adamantylthiopyridine ring. The results showed that high atomic electronegativity of polar substituent group (R = CO2H) afforded high DPPH activity, while substituent with high atomic van der Waals volumes such as R = Br gave high SOD activity. Leave-one-out cross-validation (LOO-CV) and external test set were used for model validation. Correlation coefficient (QCV) and root mean squared error (RMSECV) of the LOO-CV set for predicting DPPH activity were 0.5784 and 8.3440, respectively, while QExt and RMSEExt of external test set corresponded to 0.7353 and 4.2721, respectively. Furthermore, QCV and RMSECV values of the LOO-CV set for predicting SOD activity were 0.7549 and 5.6380, respectively. The QSAR model's equation was then used in predicting the SOD activity of tested compounds and these were subsequently verified experimentally. It was observed that the experimental activity was more potent than the predicted activity. Structure-activity relationships of significant descriptors governing antioxidant activity are also discussed. The QSAR models investigated herein are anticipated to be useful in the rational design and development of novel compounds with antioxidant activity. PMID

1. Multiple linear regression to estimate time-frequency electrophysiological responses in single trials

PubMed Central

Hu, L.; Zhang, Z.G.; Mouraux, A.; Iannetti, G.D.

2015-01-01

Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical

2. Multiple linear regression to estimate time-frequency electrophysiological responses in single trials.

PubMed

Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D

2015-05-01

Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical

3. The Prediction Properties of Inverse and Reverse Regression for the Simple Linear Calibration Problem

NASA Technical Reports Server (NTRS)

Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.

2010-01-01

The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.

4. The performance of robust multivariate regression in simultaneous dependence of variables in linear models

Alih, Ekele; Ong, Hong Choon

2014-07-01

The application of Ordinary Least Squares (OLS) to a single equation assumes among others, that the predictor variables are truly exogenous; that there is only one-way causation between the dependent variable yi and the predictor variables xij. If this is not true and the xij 'S are at the same time determined by yi, the OLS assumption will be violated and a single equation method will give biased and inconsistent parameter estimates. The OLS also suffers a huge set back in the presence of contaminated data. In order to rectify these problems, simultaneous equation models have been introduced as well as robust regression. In this paper, we construct a simultaneous equation model with variables that exhibit simultaneous dependence and we proposed a robust multivariate regression procedure for estimating the parameters of such models. The performance of the robust multivariate regression procedure was examined and compared with the OLS multivariate regression technique and the Three-Stage Least squares procedure (3SLS) using numerical simulation experiment. The performance of the robust multivariate regression and (3SLS) were approximately equally better than OLS when there is no contamination in the data. Nevertheless, when contaminations occur in the data, the robust multivariate regression outperformed the 3SLS and OLS.

5. Comparison of some biased estimation methods (including ordinary subset regression) in the linear model

NASA Technical Reports Server (NTRS)

Sidik, S. M.

1975-01-01

Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.

6. BFLCRM: A BAYESIAN FUNCTIONAL LINEAR COX REGRESSION MODEL FOR PREDICTING TIME TO CONVERSION TO ALZHEIMER’S DISEASE*

PubMed Central

Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G

2015-01-01

The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer’s disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM. PMID:26900412

7. Linear regression models, least-squares problems, normal equations, and stopping criteria for the conjugate gradient method

Arioli, M.; Gratton, S.

2012-11-01

Minimum-variance unbiased estimates for linear regression models can be obtained by solving least-squares problems. The conjugate gradient method can be successfully used in solving the symmetric and positive definite normal equations obtained from these least-squares problems. Taking into account the results of Golub and Meurant (1997, 2009) [10,11], Hestenes and Stiefel (1952) [17], and Strakoš and Tichý (2002) [16], which make it possible to approximate the energy norm of the error during the conjugate gradient iterative process, we adapt the stopping criterion introduced by Arioli (2005) [18] to the normal equations taking into account the statistical properties of the underpinning linear regression problem. Moreover, we show how the energy norm of the error is linked to the χ2-distribution and to the Fisher-Snedecor distribution. Finally, we present the results of several numerical tests that experimentally validate the effectiveness of our stopping criteria.

8. Modeling protein tandem mass spectrometry data with an extended linear regression strategy.

PubMed

Liu, Han; Bonner, Anthony J; Emili, Andrew

2004-01-01

Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithm. The intensity patterns presented in mass spectra are useful information for identification of peptides and proteins. However, widely used algorithms can not predicate the peak intensity patterns exactly. We have developed a systematic analytical approach based on a family of extended regression models, which permits routine, large scale protein expression profile modeling. By proving an important technical result that the regression coefficient vector is just the eigenvector corresponding to the least eigenvalue of a space transformed version of the original data, this extended regression problem can be reduced to a SVD decomposition problem, thus gain the robustness and efficiency. To evaluate the performance of our model, from 60,960 spectra, we chose 2,859 with high confidence, non redundant matches as training data, based on this specific problem, we derived some measurements of goodness of fit to show that our modeling method is reasonable. The issues of overfitting and underfitting are also discussed. This extended regression strategy therefore offers an effective and efficient framework for in-depth investigation of complex mammalian proteomes. PMID:17270923

9. A Comparison of Conventional Linear Regression Methods and Neural Networks for Forecasting Educational Spending.

ERIC Educational Resources Information Center

Baker, Bruce D.; Richards, Craig E.

1999-01-01

Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…

10. Creating a non-linear total sediment load formula using polynomial best subset regression model

Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali

2016-08-01

The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.

11. An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

ERIC Educational Resources Information Center

Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

2011-01-01

This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…

12. Application of Dynamic Grey-Linear Auto-regressive Model in Time Scale Calculation

Yuan, H. T.; Don, S. W.

2009-01-01

Because of the influence of different noise and the other factors, the running of an atomic clock is very complex. In order to forecast the velocity of an atomic clock accurately, it is necessary to study and design a model to calculate its velocity in the near future. By using the velocity, the clock could be used in the calculation of local atomic time and the steering of local universal time. In this paper, a new forecast model called dynamic grey-liner auto-regressive model is studied, and the precision of the new model is given. By the real data of National Time Service Center, the new model is tested.

13. Regression Is a Univariate General Linear Model Subsuming Other Parametric Methods as Special Cases.

ERIC Educational Resources Information Center

Vidal, Sherry

Although the concept of the general linear model (GLM) has existed since the 1960s, other univariate analyses such as the t-test and the analysis of variance models have remained popular. The GLM produces an equation that minimizes the mean differences of independent variables as they are related to a dependent variable. From a computer printout…

14. A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models

PubMed Central

Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.

2016-01-01

Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The

15. A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models.

PubMed

Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S

2016-01-01

Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0-20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The

16. Estimation of streamflow, base flow, and nitrate-nitrogen loads in Iowa using multiple linear regression models

USGS Publications Warehouse

Schilling, K.E.; Wolter, C.F.

2005-01-01

Nineteen variables, including precipitation, soils and geology, land use, and basin morphologic characteristics, were evaluated to develop Iowa regression models to predict total streamflow (Q), base flow (Qb), storm flow (Qs) and base flow percentage (%Qb) in gauged and ungauged watersheds in the state. Discharge records from a set of 33 watersheds across the state for the 1980 to 2000 period were separated into Qb and Qs. Multiple linear regression found that 75.5 percent of long term average Q was explained by rainfall, sand content, and row crop percentage variables, whereas 88.5 percent of Qb was explained by these three variables plus permeability and floodplain area variables. Qs was explained by average rainfall and %Qb was a function of row crop percentage, permeability, and basin slope variables. Regional regression models developed for long term average Q and Qb were adapted to annual rainfall and showed good correlation between measured and predicted values. Combining the regression model for Q with an estimate of mean annual nitrate concentration, a map of potential nitrate loads in the state was produced. Results from this study have important implications for understanding geomorphic and land use controls on streamflow and base flow in Iowa watersheds and similar agriculture dominated watersheds in the glaciated Midwest. (JAWRA) (Copyright ?? 2005).

17. Is It the Intervention or the Students? Using Linear Regression to Control for Student Characteristics in Undergraduate STEM Education Research

PubMed Central

Theobald, Roddy; Freeman, Scott

2014-01-01

Although researchers in undergraduate science, technology, engineering, and mathematics education are currently using several methods to analyze learning gains from pre- and posttest data, the most commonly used approaches have significant shortcomings. Chief among these is the inability to distinguish whether differences in learning gains are due to the effect of an instructional intervention or to differences in student characteristics when students cannot be assigned to control and treatment groups at random. Using pre- and posttest scores from an introductory biology course, we illustrate how the methods currently in wide use can lead to erroneous conclusions, and how multiple linear regression offers an effective framework for distinguishing the impact of an instructional intervention from the impact of student characteristics on test score gains. In general, we recommend that researchers always use student-level regression models that control for possible differences in student ability and preparation to estimate the effect of any nonrandomized instructional intervention on student performance. PMID:24591502

18. pKa prediction for acidic phosphorus-containing compounds using multiple linear regression with computational descriptors.

PubMed

Yu, Donghai; Du, Ruobing; Xiao, Ji-Chang

2016-07-01

Ninety-six acidic phosphorus-containing molecules with pKa 1.88 to 6.26 were collected and divided into training and test sets by random sampling. Structural parameters were obtained by density functional theory calculation of the molecules. The relationship between the experimental pKa values and structural parameters was obtained by multiple linear regression fitting for the training set, and tested with the test set; the R(2) values were 0.974 and 0.966 for the training and test sets, respectively. This regression equation, which quantitatively describes the influence of structural parameters on pKa , and can be used to predict pKa values of similar structures, is significant for the design of new acidic phosphorus-containing extractants. © 2016 Wiley Periodicals, Inc. PMID:27218266

19. Precision Interval Estimation of the Response Surface by Means of an Integrated Algorithm of Neural Network and Linear Regression

NASA Technical Reports Server (NTRS)

Lo, Ching F.

1999-01-01

The integration of Radial Basis Function Networks and Back Propagation Neural Networks with the Multiple Linear Regression has been accomplished to map nonlinear response surfaces over a wide range of independent variables in the process of the Modem Design of Experiments. The integrated method is capable to estimate the precision intervals including confidence and predicted intervals. The power of the innovative method has been demonstrated by applying to a set of wind tunnel test data in construction of response surface and estimation of precision interval.

20. Application of empirical mode decomposition with local linear quantile regression in financial time series forecasting.

PubMed

Jaber, Abobaker M; Ismail, Mohd Tahir; Altaher, Alsaidi M

2014-01-01

This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD) with nonparametric methods of local linear quantile (LLQ). We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices. PMID:25140343

1. Solving Capelin Time Series Ecosystem Problem Using Hybrid ANN-GAs Model and Multiple Linear Regression Model

Eghnam, Karam M.; Sheta, Alaa F.

2008-06-01

Development of accurate models is necessary in critical applications such as prediction. In this paper, a solution to the stock prediction problem of the Barents Sea capelin is introduced using Artificial Neural Network (ANN) and Multiple Linear model Regression (MLR) models. The Capelin stock in the Barents Sea is one of the largest in the world. It normally maintained a fishery with annual catches of up to 3 million tons. The Capelin stock problem has an impact in the fish stock development. The proposed prediction model was developed using an ANNs with their weights adapted using Genetic Algorithm (GA). The proposed model was compared to traditional linear model the MLR. The results showed that the ANN-GA model produced an overall accuracy of 21% better than the MLR model.

2. A multiple linear regression analysis of hot corrosion attack on a series of nickel base turbine alloys

NASA Technical Reports Server (NTRS)

Barrett, C. A.

1985-01-01

Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.

3. Deconvolution of antibody affinities and concentrations by non-linear regression analysis of competitive ELISA data.

SciTech Connect

Stevens, F. J.; Bobrovnik, S. A.; Biosciences Division; Palladin Inst. Biochemistry

2007-12-01

Physiological responses of the adaptive immune system are polyclonal in nature whether induced by a naturally occurring infection, by vaccination to prevent infection or, in the case of animals, by challenge with antigen to generate reagents of research or commercial significance. The composition of the polyclonal responses is distinct to each individual or animal and changes over time. Differences exist in the affinities of the constituents and their relative proportion of the responsive population. In addition, some of the antibodies bind to different sites on the antigen, whereas other pairs of antibodies are sterically restricted from concurrent interaction with the antigen. Even if generation of a monoclonal antibody is the ultimate goal of a project, the quality of the resulting reagent is ultimately related to the characteristics of the initial immune response. It is probably impossible to quantitatively parse the composition of a polyclonal response to antigen. However, molecular regression allows further parameterization of a polyclonal antiserum in the context of certain simplifying assumptions. The antiserum is described as consisting of two competing populations of high- and low-affinity and unknown relative proportions. This simple model allows the quantitative determination of representative affinities and proportions. These parameters may be of use in evaluating responses to vaccines, to evaluating continuity of antibody production whether in vaccine recipients or animals used for the production of antisera, or in optimizing selection of donors for the production of monoclonal antibodies.

4. Effective Surfactants Blend Concentration Determination for O/W Emulsion Stabilization by Two Nonionic Surfactants by Simple Linear Regression

PubMed Central

Hassan, A. K.

2015-01-01

In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature–conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility. PMID:26664063

5. Effective Surfactants Blend Concentration Determination for O/W Emulsion Stabilization by Two Nonionic Surfactants by Simple Linear Regression.

PubMed

Hassan, A K

2015-01-01

In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature-conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility. PMID:26664063

6. Use of Path Analysis and Path Diagrams as a Means of Understanding Regression, Factor Analysis, and Other Linear Structural Relations (LISREL) Models.

ERIC Educational Resources Information Center

Phillips, Gary W.

The usefulness of path analysis as a means of better understanding various linear models is demonstrated. First, two linear models are presented in matrix form using linear structural relations (LISREL) notation. The two models, regression and factor analysis, are shown to be identical although the research question and data matrix to which these…

7. Databased comparison of Sparse Bayesian Learning and Multiple Linear Regression for statistical downscaling of low flow indices

Joshi, Deepti; St-Hilaire, André; Daigle, Anik; Ouarda, Taha B. M. J.

2013-04-01

SummaryThis study attempts to compare the performance of two statistical downscaling frameworks in downscaling hydrological indices (descriptive statistics) characterizing the low flow regimes of three rivers in Eastern Canada - Moisie, Romaine and Ouelle. The statistical models selected are Relevance Vector Machine (RVM), an implementation of Sparse Bayesian Learning, and the Automated Statistical Downscaling tool (ASD), an implementation of Multiple Linear Regression. Inputs to both frameworks involve climate variables significantly (α = 0.05) correlated with the indices. These variables were processed using Canonical Correlation Analysis and the resulting canonical variates scores were used as input to RVM to estimate the selected low flow indices. In ASD, the significantly correlated climate variables were subjected to backward stepwise predictor selection and the selected predictors were subsequently used to estimate the selected low flow indices using Multiple Linear Regression. With respect to the correlation between climate variables and the selected low flow indices, it was observed that all indices are influenced, primarily, by wind components (Vertical, Zonal and Meridonal) and humidity variables (Specific and Relative Humidity). The downscaling performance of the framework involving RVM was found to be better than ASD in terms of Relative Root Mean Square Error, Relative Mean Absolute Bias and Coefficient of Determination. In all cases, the former resulted in less variability of the performance indices between calibration and validation sets, implying better generalization ability than for the latter.

8. Performance of an Axisymmetric Rocket Based Combined Cycle Engine During Rocket Only Operation Using Linear Regression Analysis

NASA Technical Reports Server (NTRS)

Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.

1998-01-01

The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.

9. Nucleus detection using gradient orientation information and linear least squares regression

Kwak, Jin Tae; Hewitt, Stephen M.; Xu, Sheng; Pinto, Peter A.; Wood, Bradford J.

2015-03-01

Computerized histopathology image analysis enables an objective, efficient, and quantitative assessment of digitized histopathology images. Such analysis often requires an accurate and efficient detection and segmentation of histological structures such as glands, cells and nuclei. The segmentation is used to characterize tissue specimens and to determine the disease status or outcomes. The segmentation of nuclei, in particular, is challenging due to the overlapping or clumped nuclei. Here, we propose a nuclei seed detection method for the individual and overlapping nuclei that utilizes the gradient orientation or direction information. The initial nuclei segmentation is provided by a multiview boosting approach. The angle of the gradient orientation is computed and traced for the nuclear boundaries. Taking the first derivative of the angle of the gradient orientation, high concavity points (junctions) are discovered. False junctions are found and removed by adopting a greedy search scheme with the goodness-of-fit statistic in a linear least squares sense. Then, the junctions determine boundary segments. Partial boundary segments belonging to the same nucleus are identified and combined by examining the overlapping area between them. Using the final set of the boundary segments, we generate the list of seeds in tissue images. The method achieved an overall precision of 0.89 and a recall of 0.88 in comparison to the manual segmentation.

10. A componential model of human interaction with graphs: 1. Linear regression modeling

NASA Technical Reports Server (NTRS)

Gillan, Douglas J.; Lewis, Robert

1994-01-01

Task analyses served as the basis for developing the Mixed Arithmetic-Perceptual (MA-P) model, which proposes (1) that people interacting with common graphs to answer common questions apply a set of component processes-searching for indicators, encoding the value of indicators, performing arithmetic operations on the values, making spatial comparisons among indicators, and repsonding; and (2) that the type of graph and user's task determine the combination and order of the components applied (i.e., the processing steps). Two experiments investigated the prediction that response time will be linearly related to the number of processing steps according to the MA-P model. Subjects used line graphs, scatter plots, and stacked bar graphs to answer comparison questions and questions requiring arithmetic calculations. A one-parameter version of the model (with equal weights for all components) and a two-parameter version (with different weights for arithmetic and nonarithmetic processes) accounted for 76%-85% of individual subjects' variance in response time and 61%-68% of the variance taken across all subjects. The discussion addresses possible modifications in the MA-P model, alternative models, and design implications from the MA-P model.

11. The overlooked potential of Generalized Linear Models in astronomy-II: Gamma regression and photometric redshifts

Elliott, J.; de Souza, R. S.; Krone-Martins, A.; Cameron, E.; Ishida, E. E. O.; Hilbe, J.

2015-04-01

Machine learning techniques offer a precious tool box for use within astronomy to solve problems involving so-called big data. They provide a means to make accurate predictions about a particular system without prior knowledge of the underlying physical processes of the data. In this article, and the companion papers of this series, we present the set of Generalized Linear Models (GLMs) as a fast alternative method for tackling general astronomical problems, including the ones related to the machine learning paradigm. To demonstrate the applicability of GLMs to inherently positive and continuous physical observables, we explore their use in estimating the photometric redshifts of galaxies from their multi-wavelength photometry. Using the gamma family with a log link function we predict redshifts from the PHoto-z Accuracy Testing simulated catalogue and a subset of the Sloan Digital Sky Survey from Data Release 10. We obtain fits that result in catastrophic outlier rates as low as ∼1% for simulated and ∼2% for real data. Moreover, we can easily obtain such levels of precision within a matter of seconds on a normal desktop computer and with training sets that contain merely thousands of galaxies. Our software is made publicly available as a user-friendly package developed in Python, R and via an interactive web application. This software allows users to apply a set of GLMs to their own photometric catalogues and generates publication quality plots with minimum effort. By facilitating their ease of use to the astronomical community, this paper series aims to make GLMs widely known and to encourage their implementation in future large-scale projects, such as the Large Synoptic Survey Telescope.

12. The use of artificial neural networks and multiple linear regression to predict rate of medical waste generation

SciTech Connect

2009-11-15

Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R{sup 2} were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R{sup 2} confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.

13. Linear and nonlinear modeling of antifungal activity of some heterocyclic ring derivatives using multiple linear regression and Bayesian-regularized neural networks.

PubMed

Caballero, Julio; Fernández, Michael

2006-01-01

Antifungal activity was modeled for a set of 96 heterocyclic ring derivatives (2,5,6-trisubstituted benzoxazoles, 2,5-disubstituted benzimidazoles, 2-substituted benzothiazoles and 2-substituted oxazolo(4,5-b)pyridines) using multiple linear regression (MLR) and Bayesian-regularized artificial neural network (BRANN) techniques. Inhibitory activity against Candida albicans (log(1/C)) was correlated with 3D descriptors encoding the chemical structures of the heterocyclic compounds. Training and test sets were chosen by means of k-Means Clustering. The most appropriate variables for linear and nonlinear modeling were selected using a genetic algorithm (GA) approach. In addition to the MLR equation (MLR-GA), two nonlinear models were built, model BRANN employing the linear variable subset and an optimum model BRANN-GA obtained by a hybrid method that combined BRANN and GA approaches (BRANN-GA). The linear model fit the training set (n = 80) with r2 = 0.746, while BRANN and BRANN-GA gave higher values of r2 = 0.889 and r2 = 0.937, respectively. Beyond the improvement of training set fitting, the BRANN-GA model was superior to the others by being able to describe 87% of test set (n = 16) variance in comparison with 78 and 81% the MLR-GA and BRANN models, respectively. Our quantitative structure-activity relationship study suggests that the distributions of atomic mass, volume and polarizability have relevant relationships with the antifungal potency of the compounds studied. Furthermore, the ability of the six variables selected nonlinearly to differentiate the data was demonstrated when the total data set was well distributed in a Kohonen self-organizing neural network (KNN). PMID:16205958

14. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.

2016-01-01

Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.

15. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

dos Santos, T. S.; Mendes, D.; Torres, R. R.

2015-08-01

Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANN) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon, Northeastern Brazil and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model out- put and observed monthly precipitation. We used GCMs experiments for the 20th century (RCP Historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANN significantly outperforms the MLR downscaling of monthly precipitation variability.

16. Hall-Petch and multiple linear regression equations for the prediction of mechanical properties in gamma-based titanium aluminides

SciTech Connect

Soboyejo, W.O.; Soboyejo, A.B.O.; Ni, Y.; Mercer, C.

1997-12-31

In a recent paper, Mercer and Soboyejo demonstrated the Hall-Petch dependence of basic room- and elevated-temperature (815 C) mechanical properties (0.2% offset strength, ultimate tensile strength, plastic elongation to failure and fracture toughness) on the average equiaxed/lamellar grain size. Simple Hall-Petch behavior was shown to occur in a wide range of extruded duplex {alpha}{sub 2}+{gamma} alloys (Ti-48Al, Ti-48Al-1.4Mn Ti-48Al-2Mn and Ti-48Al-1.5Cr). As in steels and other materials, simple Hall-Petch equations were derived for the above properties. However, the Hall-Petch equations did not include the effect of other variables that can affect the basic mechanical properties of gamma alloys. Multiple linear regression equations for the prediction of the combined effects of several (alloying, microstructure and temperature) variables on basic mechanical properties temperature are presented in this paper.

17. Evaluation of the multiple linear regression method to monitor respiratory mechanics in ventilated neonates and young children.

PubMed

Rousselot, J M; Peslin, R; Duvivier, C

1992-07-01

A potentially useful method to monitor respiratory mechanics in artificially ventilated patients consists of analyzing the relationship between tracheal pressure (P), lung volume (V), and gas flow (V) by multiple linear regression (MLR) using a suitable model. Contrary to other methods, it does not require any particular flow waveform and, therefore, may be used with any ventilator. This approach was evaluated in three neonates and seven young children admitted into an intensive care unit for respiratory disorders of various etiologies. P and V were measured and digitized at a sampling rate of 40 Hz for periods of 20-48 s. After correction of P for the non-linear resistance of the endotracheal tube, the data were first analyzed with the usual linear monoalveolar model: P = PO + E.V + R.V where E and R are total respiratory elastance and resistance, and PO is the static recoil pressure at end-expiration. A good fit of the model to the data was seen in five of ten children. PO, E, and R were reproducible within cycles, and consistent with the patient's age and condition; the data obtained with two ventilatory modes were highly correlated. In the five instances in which the simple model did not fit the data well, they were reanalyzed with more sophisticated models allowing for mechanical non-homogeneity or for non-linearity of R or E. While several models substantially improved the fit, physiologically meaningful results were only obtained when R was allowed to change with lung volume. We conclude that the MLR method is adequate to monitor respiratory mechanics, even when the usual model is inadequate. PMID:1437330

18. GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies

PubMed Central

2013-01-01

Background Genome-wide association studies have become very popular in identifying genetic contributions to phenotypes. Millions of SNPs are being tested for their association with diseases and traits using linear or logistic regression models. This conceptually simple strategy encounters the following computational issues: a large number of tests and very large genotype files (many Gigabytes) which cannot be directly loaded into the software memory. One of the solutions applied on a grand scale is cluster computing involving large-scale resources. We show how to speed up the computations using matrix operations in pure R code. Results We improve speed: computation time from 6 hours is reduced to 10-15 minutes. Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them into computer memory becomes an important issue. However, much improvement can be made if the data is structured beforehand in a way allowing for easy access to blocks of SNPs. We propose several solutions based on the R packages ff and ncdf. We adapted the semi-parallel computations for logistic regression. We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures. Conclusions We provide very fast algorithms for GWAS written in pure R code. We also show how to rearrange SNP data for fast access. PMID:23711206

19. Determination of Cefoperazone Sodium in Presence of Related Impurities by Linear Support Vector Regression and Partial Least Squares Chemometric Models

PubMed Central

Naguib, Ibrahim A.; Abdelaleem, Eglal A.; Zaazaa, Hala E.; Hussein, Essraa A.

2015-01-01

A comparison between partial least squares regression and support vector regression chemometric models is introduced in this study. The two models are implemented to analyze cefoperazone sodium in presence of its reported impurities, 7-aminocephalosporanic acid and 5-mercapto-1-methyl-tetrazole, in pure powders and in pharmaceutical formulations through processing UV spectroscopic data. For best results, a 3-factor 4-level experimental design was used, resulting in a training set of 16 mixtures containing different ratios of interfering moieties. For method validation, an independent test set consisting of 9 mixtures was used to test predictive ability of established models. The introduced results show the capability of the two proposed models to analyze cefoperazone in presence of its impurities 7-aminocephalosporanic acid and 5-mercapto-1-methyl-tetrazole with high trueness and selectivity (101.87 ± 0.708 and 101.43 ± 0.536 for PLSR and linear SVR, resp.). Analysis results of drug products were statistically compared to a reported HPLC method showing no significant difference in trueness and precision, indicating the capability of the suggested multivariate calibration models to be reliable and adequate for routine quality control analysis of drug product. SVR offers more accurate results with lower prediction error compared to PLSR model; however, PLSR is easy to handle and fast to optimize. PMID:26664764

20. Effect of source variation on drug release from HPMC tablets: linear regression modeling for prediction of drug release.

PubMed

2011-06-15

The aim of this study was to investigate the effect of source variation of hydroxypropyl methylcellulose (HPMC) raw material on prediction of drug release from HPMC matrix tablets. To achieve this objective, the flow ability (i.e., angle of repose and Carr's compressibility index) and apparent viscosity of HPMC from 3 sources was investigated to differentiate HPMC source variation. The physicochemical properties of drug and manufacturing process were also incorporated to develop the linear regression model for prediction of drug release. Specifically, the in vitro release of 18 formulations was determined according to a 2 × 3 × 3 full factorial design. Further regression analysis provided a quantitative relationship between the response and the studied independent variables. It was found that either apparent viscosity or Carr's compressibility index of HPMC powders combining with solubility and molecular weight of drug had significant impact on the release behavior of drug. The increased drug release was observed when a greater in drug solubility and a decrease in the molecular weight of drug were applied. Most importantly, this study has shown that the HPMC having low viscosity or high compressibility index resulted in an increase of drug release, especially in the case of poorly soluble drugs. PMID:21420475

1. Assessing the risk of bovine fasciolosis using linear regression analysis for the state of Rio Grande do Sul, Brazil.

PubMed

Silva, Ana Elisa Pereira; Freitas, Corina da Costa; Dutra, Luciano Vieira; Molento, Marcelo Beltrão

2016-02-15

Fasciola hepatica is the causative agent of fasciolosis, a disease that triggers a chronic inflammatory process in the liver affecting mainly ruminants and other animals including humans. In Brazil, F. hepatica occurs in larger numbers in the most Southern state of Rio Grande do Sul. The objective of this study was to estimate areas at risk using an eight-year (2002-2010) time series of climatic and environmental variables that best relate to the disease using a linear regression method to municipalities in the state of Rio Grande do Sul. The positivity index of the disease, which is the rate of infected animal per slaughtered animal, was divided into three risk classes: low, medium and high. The accuracy of the known sample classification on the confusion matrix for the low, medium and high rates produced by the estimated model presented values between 39 and 88% depending of the year. The regression analysis showed the importance of the time-based data for the construction of the model, considering the two variables of the previous year of the event (positivity index and maximum temperature). The generated data is important for epidemiological and parasite control studies mainly because F. hepatica is an infection that can last from months to years. PMID:26827853

2. QSAR study of HCV NS5B polymerase inhibitors using the genetic algorithm-multiple linear regression (GA-MLR)

PubMed Central

2016-01-01

Quantitative structure-activity relationship (QSAR) study has been employed for predicting the inhibitory activities of the Hepatitis C virus (HCV) NS5B polymerase inhibitors. A data set consisted of 72 compounds was selected, and then different types of molecular descriptors were calculated. The whole data set was split into a training set (80 % of the dataset) and a test set (20 % of the dataset) using principle component analysis. The stepwise (SW) and the genetic algorithm (GA) techniques were used as variable selection tools. Multiple linear regression method was then used to linearly correlate the selected descriptors with inhibitory activities. Several validation technique including leave-one-out and leave-group-out cross-validation, Y-randomization method were used to evaluate the internal capability of the derived models. The external prediction ability of the derived models was further analyzed using modified r2, concordance correlation coefficient values and Golbraikh and Tropsha acceptable model criteria's. Based on the derived results (GA-MLR), some new insights toward molecular structural requirements for obtaining better inhibitory activity were obtained. PMID:27065774

3. Age-Adjustment and Related Epidemiology Rates in Education and Research

ERIC Educational Resources Information Center

Baker, John D.; Kruckman, Laurence; George, Joyce

2006-01-01

A quick review of introductory textbooks reveals that while gerontology authors and instructors introduce some aspect of demography and epidemiology data, there is limited focus on age adjustment or other important epidemiology rates. The goal of this paper is to reintroduce a variety of basic epidemiology strategies such as incidence, prevalence,…

4. Comparing Least Squares and Robust Methods in Linear Regression Analysis of the Discharge of the Flathead River, Northwestern Montana.

Bell, A. L.; Moore, J. N.; Greenwood, M. C.

2007-12-01

The Flathead River in Northwestern Montana drains the relatively pristine, high-mountain watersheds of Glacier- Waterton national parks and large wilderness areas making it an excellent test-bed for hydrologic response to climate change. Flows in the North Fork and Middle Fork of the Flathead River are relatively unmodified by humans, whereas the South Fork has a large hydroelectric reservoir (Hungry Horse) in the lower end of the basin. USGS stream gage data for the North, Middle and South forks from 1940 to 2006 were analyzed for significant trends in the timing of quantiles of flow to examine climate forcing vs. direct modification of flow from the dam. The trends in timing were analyzed for climate change influences using the PRISM model output for 1940 to 2006 for the respective basin. The analysis of trends in timing employed two linear regression methods, typical least squares estimation and robust estimation using weighted least squares. Least squares estimation is the standard method employed when performing regression analysis. The power of this method is sensitive to the violation of the assumptions of normally distributed errors with constant variance (homoscedasticity). Considering that violations of these assumptions are common in hydrologic data, robust estimation was used to preserve the desired statistical power because it is not significantly affected by non-normality or heteroscedasticity. Least squares estimated trends that were found to be significant, using a 10% significance level, were typically not significant using a robust estimation method. This could have implications for interpreting the meaning of significant trends found using the least squares estimator. Utilizing robust estimation methods for analyzing hydrologic data may allow investigators to more accurately summarize any trends.

5. Spectroscopic determination of leaf biochemistry using band-depth analysis of absorption features and stepwise multiple linear regression

USGS Publications Warehouse

Kokaly, R.F.; Clark, R.N.

1999-01-01

We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using

6. Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression

Deml, Ann M.; O'Hayre, Ryan; Wolverton, Chris; Stevanović, Vladan

2016-02-01

The availability of quantitatively accurate total energies (Etot) of atoms, molecules, and solids, enabled by the development of density functional theory (DFT), has transformed solid state physics, quantum chemistry, and materials science by allowing direct calculations of measureable quantities, such as enthalpies of formation (Δ Hf ). Still, the ability to compute Etot and Δ Hf values does not, necessarily, provide insights into the physical mechanisms behind their magnitudes or chemical trends. Here, we examine a large set of calculated Etot and Δ Hf values obtained from the DFT+U -based fitted elemental-phase reference energies (FERE) approach [V. Stevanović, S. Lany, X. Zhang, and A. Zunger, Phys. Rev. B 85, 115104 (2012), 10.1103/PhysRevB.85.115104] to probe relationships between the Etot/Δ Hf of metal-nonmetal compounds in their ground-state crystal structures and properties describing the compound compositions and their elemental constituents. From a stepwise linear regression, we develop a linear model for Etot, and consequently Δ Hf , that reproduces calculated FERE values with a mean absolute error of ˜80 meV/atom. The most significant contributions to the model include calculated total energies of the constituent elements in their reference phases (e.g., metallic iron or gas phase O2), atomic ionization energies and electron affinities, Pauling electronegativity differences, and atomic electric polarizabilities. These contributions are discussed in the context of their connection to the underlying physics. We also demonstrate that our Etot/Δ Hf model can be directly extended to predict the Etot and Δ Hf of compounds outside the set used to develop the model.

7. MODELING IN VITRO INHIBITION OF BUTYRYLCHOLINESTERASE USING MOLECULAR DOCKING, MULTI-LINEAR REGRESSION AND ARTIFICIAL NEURAL NETWORK APPROACHES

PubMed Central

Zheng, Fang; Zhan, Max; Huang, Xiaoqin; AbdulHameed, Mohamed Diwan M.; Zhan, Chang-Guo

2013-01-01

Butyrylcholinesterase (BChE) has been an important protein used for development of anti-cocaine medication. Through computational design, BChE mutants with ~2000-fold improved catalytic efficiency against cocaine have been discovered in our lab. To study drug-enzyme interaction it is important to build mathematical model to predict molecular inhibitory activity against BChE. This report presents a neural network (NN) QSAR study, compared with multi-linear regression (MLR) and molecular docking, on a set of 93 small molecules that act as inhibitors of BChE by use of the inhibitory activities (pIC50 values) of the molecules as target values. The statistical results for the linear model built from docking generated energy descriptors were: r2 = 0.67, rmsd = 0.87, q2 = 0.65 and loormsd = 0.90; The statistical results for the ligand-based MLR model were: r2 = 0.89, rmsd = 0.51, q2 = 0.85 and loormsd = 0.58; the statistical results for the ligand-based NN model were the best: r2 = 0.95, rmsd = 0.33, q2 = 0.90 and loormsd = 0.48, demonstrating that the NN is powerful in analysis of a set of complicated data. As BChE is also an established drug target to develop new treatment for Alzheimer’s disease (AD). The developped QSAR models provide tools for rationalizing identification of potential BChE inhibitors or selection of compounds for synthesis in the discovery of novel effective inhibitors of BChE in the future. PMID:24290065

8. Predicting Distribution and Inter-Annual Variability of Tropical Cyclone Intensity from a Stochastic, Multiple-Linear Regression Model

Lee, C. Y.; Tippett, M. K.; Sobel, A. H.; Camargo, S. J.

2014-12-01

We are working towards the development of a new statistical-dynamical downscaling system to study the influence of climate on tropical cyclones (TCs). The first step is development of an appropriate model for TC intensity as a function of environmental variables. We approach this issue with a stochastic model consisting of a multiple linear regression model (MLR) for 12-hour intensity forecasts as a deterministic component, and a random error generator as a stochastic component. Similar to the operational Statistical Hurricane Intensity Prediction Scheme (SHIPS), MLR relates the surrounding environment to storm intensity, but with only essential predictors calculated from monthly-mean NCEP reanalysis fields (potential intensity, shear, etc.) and from persistence. The deterministic MLR is developed with data from 1981-1999 and tested with data from 2000-2012 for the Atlantic, Eastern North Pacific, Western North Pacific, Indian Ocean, and Southern Hemisphere basins. While the global MLR's skill is comparable to that of the operational statistical models (e.g., SHIPS), the distribution of the predicted maximum intensity from deterministic results has a systematic low bias compared to observations; the deterministic MLR creates almost no storms with intensities greater than 100 kt. The deterministic MLR can be significantly improved by adding the stochastic component, based on the distribution of random forecasting errors from the deterministic model compared to the training data. This stochastic component may be thought of as representing the component of TC intensification that is not linearly related to the environmental variables. We find that in order for the stochastic model to accurately capture the observed distribution of maximum storm intensities, the stochastic component must be auto-correlated across 12-hour time steps. This presentation also includes a detailed discussion of the distributions of other TC-intensity related quantities, as well as the inter

9. Prediction of the processing factor for pesticides in apple juice by principal component analysis and multiple linear regression.

PubMed

Martin, L; Mezcua, M; Ferrer, C; Gil Garcia, M D; Malato, O; Fernandez-Alba, A R

2013-01-01

The main objective of this work was to establish a mathematical function that correlates pesticide residue levels in apple juice with the levels of the pesticides applied on the raw fruit, taking into account some of their physicochemical properties such as water solubility, the octanol/water partition coefficient, the organic carbon partition coefficient, vapour pressure and density. A mixture of 12 pesticides was applied to an apple tree; apples were collected after 10 days of application. After harvest, apples were treated with a mixture of three post-harvest pesticides and the fruits were then processed in order to obtain apple juice following a routine industrial process. The pesticide residue levels in the apple samples were analysed using two multi-residue methods based on LC-MS/MS and GC-MS/MS. The concentration of pesticides was determined in samples derived from the different steps of processing. The processing factors (the coefficient between residue level in the processed commodity and the residue level in the commodity to be processed) obtained for the full juicing process were found to vary among the different pesticides studied. In order to investigate the relationships between the levels of pesticide residue found in apple juice samples and their physicochemical properties, principal component analysis (PCA) was performed using two sets of samples (one of them using experimental data obtained in this work and the other including the data taken from the literature). In both cases the correlation was found between processing factors of pesticides in the apple juice and the negative logarithms (base 10) of the water solubility, octanol/water partition coefficient and organic carbon partition coefficient. The linear correlation between these physicochemical properties and the processing factor were established using a multiple linear regression technique. PMID:23281800

10. Non-Linear Wavelet Regression and Branch & Bound Optimization for the Full Identification of Bivariate Operator Fractional Brownian Motion

Frecon, Jordan; Didier, Gustavo; Pustelnik, Nelly; Abry, Patrice

2016-08-01

Self-similarity is widely considered the reference framework for modeling the scaling properties of real-world data. However, most theoretical studies and their practical use have remained univariate. Operator Fractional Brownian Motion (OfBm) was recently proposed as a multivariate model for self-similarity. Yet it has remained seldom used in applications because of serious issues that appear in the joint estimation of its numerous parameters. While the univariate fractional Brownian motion requires the estimation of two parameters only, its mere bivariate extension already involves 7 parameters which are very different in nature. The present contribution proposes a method for the full identification of bivariate OfBm (i.e., the joint estimation of all parameters) through an original formulation as a non-linear wavelet regression coupled with a custom-made Branch & Bound numerical scheme. The estimation performance (consistency and asymptotic normality) is mathematically established and numerically assessed by means of Monte Carlo experiments. The impact of the parameters defining OfBm on the estimation performance as well as the associated computational costs are also thoroughly investigated.

11. Relationships between each part of the spinal curves and upright posture using Multiple stepwise linear regression analysis.

PubMed

Boulet, Sebastien; Boudot, Elsa; Houel, Nicolas

2016-05-01

Back pain is a common reason for consultation in primary healthcare clinical practice, and has effects on daily activities and posture. Relationships between the whole spine and upright posture, however, remain unknown. The aim of this study was to identify the relationship between each spinal curve and centre of pressure position as well as velocity for healthy subjects. Twenty-one male subjects performed quiet stance in natural position. Each upright posture was then recorded using an optoelectronics system (Vicon Nexus) synchronized with two force plates. At each moment, polynomial interpolations of markers attached on the spine segment were used to compute cervical lordosis, thoracic kyphosis and lumbar lordosis angle curves. Mean of centre of pressure position and velocity was then computed. Multiple stepwise linear regression analysis showed that the position and velocity of centre of pressure associated with each part of the spinal curves were defined as best predictors of the lumbar lordosis angle (R(2)=0.45; p=1.65*10-10) and the thoracic kyphosis angle (R(2)=0.54; p=4.89*10-13) of healthy subjects in quiet stance. This study showed the relationships between each of cervical, thoracic, lumbar curvatures, and centre of pressure's fluctuation during free quiet standing using non-invasive full spinal curve exploration. PMID:26970888

12. The overlooked potential of generalized linear models in astronomy - III. Bayesian negative binomial regression and globular cluster populations

de Souza, R. S.; Hilbe, J. M.; Buelens, B.; Riggs, J. D.; Cameron, E.; Ishida, E. E. O.; Chies-Santos, A. L.; Killedar, M.

2015-10-01

In this paper, the third in a series illustrating the power of generalized linear models (GLMs) for the astronomical community, we elucidate the potential of the class of GLMs which handles count data. The size of a galaxy's globular cluster (GC) population (NGC) is a prolonged puzzle in the astronomical literature. It falls in the category of count data analysis, yet it is usually modelled as if it were a continuous response variable. We have developed a Bayesian negative binomial regression model to study the connection between NGC and the following galaxy properties: central black hole mass, dynamical bulge mass, bulge velocity dispersion and absolute visual magnitude. The methodology introduced herein naturally accounts for heteroscedasticity, intrinsic scatter, errors in measurements in both axes (either discrete or continuous) and allows modelling the population of GCs on their natural scale as a non-negative integer variable. Prediction intervals of 99 per cent around the trend for expected NGC comfortably envelope the data, notably including the Milky Way, which has hitherto been considered a problematic outlier. Finally, we demonstrate how random intercept models can incorporate information of each particular galaxy morphological type. Bayesian variable selection methodology allows for automatically identifying galaxy types with different productions of GCs, suggesting that on average S0 galaxies have a GC population 35 per cent smaller than other types with similar brightness.

13. Ranking contributing areas of salt and selenium in the Lower Gunnison River Basin, Colorado, using multiple linear regression models

USGS Publications Warehouse

Linard, Joshua I.

2013-01-01

Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.

14. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

PubMed

Shabri, Ani; Samsudin, Ruhaidah

2014-01-01

Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666

15. Development of a clusterwise-linear-regression-based forecasting system for characterizing DNAPL dissolution behaviors in porous media.

PubMed

Wang, S; Huang, G H; He, L

2012-09-01

Groundwater contamination by dense non-aqueous phase liquids (DNAPLs) has become an issue of great concern in many industrialized countries due to their serious threat to human health. Dissolution and transport of DNAPLs in porous media are complicated, multidimensional and multiphase processes, which pose formidable challenges for investigation of their behaviors and implementation of effective remediation technologies. Numerical simulation models could help gain in-depth insight into complex mechanisms of DNAPLs dissolution and transport processes in the subsurface; however, they were computationally expensive, especially when a large number of runs were required, which was considered as a major obstacle for conducting further analysis. Therefore, proxy models that mimic key characteristics of a full simulation model were desired to save many orders of magnitude of computational cost. In this study, a clusterwise-linear-regression (CLR)-based forecasting system was developed for establishing a statistical relationship between DNAPL dissolution behaviors and system conditions under discrete and nonlinear complexities. The results indicated that the developed CLR-based forecasting system was capable not only of predicting DNAPL concentrations with acceptable error levels, but also of providing a significance level in each cutting/merging step such that the accuracies of the developed forecasting trees could be controlled. This study was a first attempt to apply the CLR model to characterize DNAPL dissolution and transport processes. PMID:22789814

16. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

PubMed Central

Shabri, Ani; Samsudin, Ruhaidah

2014-01-01

Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666

17. Age-adjustment and related epidemiology rates in education and research.

PubMed

Baker, John D; Kruckman, Laurence; George, Joyce

2006-01-01

A quick review of introductory textbooks reveals that while gerontology authors and instructors introduce some aspect of demography and epidemiology data, there is limited focus on age adjustment or other important epidemiology rates. The goal of this paper is to reintroduce a variety of basic epidemiology strategies such as incidence, prevalence, crude, age-specific and age-adjustment rates into the gerontology classroom. Background information and formulas for each rate, as well as examples of how they can be applied are provided. A recent change, encouraged by the U.S. Department of Health and Human Services, from a 1940 to a 2000 "standard million population" for ageadjusted rates, is reviewed. Finally, a teaching module with answers is provided for use in the gerontology classroom. PMID:16873207

18. Growth comparison of several Escherichia coli strains exposed to various concentrations of lactoferrin using linear spline regression

PubMed Central

2012-01-01

Background We wanted to compare growth differences between 13 Escherichia coli strains exposed to various concentrations of the growth inhibitor lactoferrin in two different types of broth (Syncase and Luria-Bertani (LB)). To carry this out, we present a simple statistical procedure that separates microbial growth curves that are due to natural random perturbations and growth curves that are more likely caused by biological differences. Bacterial growth was determined using optical density data (OD) recorded for triplicates at 620 nm for 18 hours for each strain. Each resulting growth curve was divided into three equally spaced intervals. We propose a procedure using linear spline regression with two knots to compute the slopes of each interval in the bacterial growth curves. These slopes are subsequently used to estimate a 95% confidence interval based on an appropriate statistical distribution. Slopes outside the confidence interval were considered as significantly different from slopes within. We also demonstrate the use of related, but more advanced methods known collectively as generalized additive models (GAMs) to model growth. In addition to impressive curve fitting capabilities with corresponding confidence intervals, GAM’s allow for the computation of derivatives, i.e. growth rate estimation, with respect to each time point. Results The results from our proposed procedure agreed well with the observed data. The results indicated that there were substantial growth differences between the E. coli strains. Most strains exhibited improved growth in the nutrient rich LB broth compared to Syncase. The inhibiting effect of lactoferrin varied between the different strains. The atypical enteropathogenic aEPEC-2 grew, on average, faster in both broths than the other strains tested while the enteroinvasive strains, EIEC-6 and EIEC-7 grew slower. The enterotoxigenic ETEC-5 strain, exhibited exceptional growth in Syncase broth, but slower growth in LB broth

19. Quantitative structure-antibacterial activity relationship modeling using a combination of piecewise linear regression-discriminant analysis (I): Quantum chemical, topographic, and topological descriptors

Molina, Enrique; Estrada, Ernesto; Nodarse, Delvin; Torres, Luis A.; González, Humberto; Uriarte, Eugenio

Time-dependent antibacterial activity of 2-furylethylenes using quantum chemical, topographic, and topological indices is described as inhibition of respiration in E. coli. A QSAR strategy based on the combination of the linear piecewise regression and the discriminant analysis is used to predict the biological activity values of strong and moderates antibacterial furylethylenes. The breakpoint in the values of the biological activity was detected. The biological activities of the compounds are described by two linear regression equations. A discriminant analysis is carried out to classify the compounds in one of the biological activity two groups. The results showed using different kind of descriptors were compared. In all cases the piecewise linear regression - discriminant analysis (PLR-DA) method produced significantly better QSAR models than the linear regression analysis. The QSAR models were validated using an external validation previously extracted from the original data. A prediction of reported antibacterial activity analysis was carried out showing dependence between the probability of a good classification and the experimental antibacterial activity. Statistical parameters showed the quality of quantum-chemical descriptors based models prediction in LDA having an accuracy of 0.9 and a C of 0.9. The best PLR-DA model explains more than 92% of the variance of experimental activity. Models with best prediction results were those based on quantum-chemical descriptors. An interpretation of quantum-chemical descriptors entered in models was carried out.

20. Cadmium-hazard mapping using a general linear regression model (Irr-Cad) for rapid risk assessment.

PubMed

Simmons, Robert W; Noble, Andrew D; Pongsakul, P; Sukreeyapongse, O; Chinabut, N

2009-02-01

Research undertaken over the last 40 years has identified the irrefutable relationship between the long-term consumption of cadmium (Cd)-contaminated rice and human Cd disease. In order to protect public health and livelihood security, the ability to accurately and rapidly determine spatial Cd contamination is of high priority. During 2001-2004, a General Linear Regression Model Irr-Cad was developed to predict the spatial distribution of soil Cd in a Cd/Zn co-contaminated cascading irrigated rice-based system in Mae Sot District, Tak Province, Thailand (Longitude E 98 degrees 59'-E 98 degrees 63' and Latitude N 16 degrees 67'-16 degrees 66'). The results indicate that Irr-Cad accounted for 98% of the variance in mean Field Order total soil Cd. Preliminary validation indicated that Irr-Cad 'predicted' mean Field Order total soil Cd, was significantly (p < 0.001) correlated (R (2) = 0.92) with 'observed' mean Field Order total soil Cd values. Field Order is determined by a given field's proximity to primary outlets from in-field irrigation channels and subsequent inter-field irrigation flows. This in turn determines Field Order in Irrigation Sequence (Field Order(IS)). Mean Field Order total soil Cd represents the mean total soil Cd (aqua regia-digested) for a given Field Order(IS). In 2004-2005, Irr-Cad was utilized to evaluate the spatial distribution of total soil Cd in a 'high-risk' area of Mae Sot District. Secondary validation on six randomly selected field groups verified that Irr-Cad predicted mean Field Order total soil Cd and was significantly (p < 0.001) correlated with the observed mean Field Order total soil Cd with R (2) values ranging from 0.89 to 0.97. The practical applicability of Irr-Cad is in its minimal input requirements, namely the classification of fields in terms of Field Order(IS), strategic sampling of all primary fields and laboratory based determination of total soil Cd (T-Cd(P)) and the use of a weighed coefficient for Cd (Coeff

1. Multiple Linear Regression Analysis of Factors Affecting Real Property Price Index From Case Study Research In Istanbul/Turkey

Denli, H. H.; Koc, Z.

2015-12-01

Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.

2. On the developmenet of multi-linear regression analysis to assess energy consumption in the early stages of building design

Shams Amiri, Shideh

Modeling of energy consumption in buildings is essential for different applications such as building energy management and establishing baselines. This makes building energy consumption estimation as a key tool to achieve the goals on energy consumption and emissions reduction. Energy performance of building is complex, since it depends on several parameters related to the building characteristics, equipment and systems, weather, occupants, and sociological influences. This paper presents a new model to predict and quantify energy consumption in commercial buildings in the early stages of the design. eQUEST and DOE-2 building simulation software was used to build and simulate individual building configuration that were generated using Monte Carlo simulation technique. Ten thousands simulations for seven building shapes were performed to create a comprehensive dataset covering the full ranges of design parameters. The present study considered building materials, their thickness, building shape, and occupant schedule as design variables since building energy performance is sensitive to these variables. Then, the results of the energy simulations were implemented into a set of regression equation to predict the energy consumption in each design scenario. The difference between regression-predicted and DOE-simulated annual building energy consumption are largely within 5%. It is envisioned that the developed regression models can be utilized to estimate the energy savings in the early stages of the design when different building schemes and design concepts are being considered. Keywords: eQUEST simulation, DOE-2 simulation, Monte Carlo simulation, Regression equations, Building energy performance

3. A strategy for multivariate calibration based on modified single-index signal regression: Capturing explicit non-linearity and improving prediction accuracy

Zhang, Xiaoyu; Li, Qingbo; Zhang, Guangjun

2013-11-01

In this paper, a modified single-index signal regression (mSISR) method is proposed to construct a nonlinear and practical model with high-accuracy. The mSISR method defines the optimal penalty tuning parameter in P-spline signal regression (PSR) as initial tuning parameter and chooses the number of cycles based on minimizing root mean squared error of cross-validation (RMSECV). mSISR is superior to single-index signal regression (SISR) in terms of accuracy, computation time and convergency. And it can provide the character of the non-linearity between spectra and responses in a more precise manner than SISR. Two spectra data sets from basic research experiments, including plant chlorophyll nondestructive measurement and human blood glucose noninvasive measurement, are employed to illustrate the advantages of mSISR. The results indicate that the mSISR method (i) obtains the smooth and helpful regression coefficient vector, (ii) explicitly exhibits the type and amount of the non-linearity, (iii) can take advantage of nonlinear features of the signals to improve prediction performance and (iv) has distinct adaptability for the complex spectra model by comparing with other calibration methods. It is validated that mSISR is a promising nonlinear modeling strategy for multivariate calibration.

4. Quantitative structure-property relationship (QSPR) for the adsorption of organic compounds onto activated carbon cloth: Comparison between multiple linear regression and neural network

SciTech Connect

Brasquet, C.; Bourges, B.; Le Cloirec, P.

1999-12-01

The adsorption of 55 organic compounds is carried out onto a recently discovered adsorbent, activated carbon cloth. Isotherms are modeled using the Freundlich classical model, and the large database generated allows qualitative assumptions about the adsorption mechanism. However, to confirm these assumptions, a quantitative structure-property relationship methodology is used to assess the correlations between an adsorbability parameter (expressed using the Freundlich parameter K) and topological indices related to the compounds molecular structure (molecular connectivity indices, MCI). This correlation is set up by mean of two different statistical tools, multiple linear regression (MLR) and neural network (NN). A principal component analysis is carried out to generate new and uncorrelated variables. It enables the relations between the MCI to be analyzed, but the multiple linear regression assessed using the principal components (PCs) has a poor statistical quality and introduces high order PCs, too inaccurate for an explanation of the adsorption mechanism. The correlations are thus set up using the original variables (MCI), and both statistical tools, multiple linear regression and neutral network, are compared from a descriptive and predictive point of view. To compare the predictive ability of both methods, a test database of 10 organic compounds is used.

5. Age-adjusted plasma N-terminal pro-brain natriuretic peptide level in Kawasaki disease

PubMed Central

Jun, Heul; Ko, Kyung Ok; Lim, Jae Woo; Yoon, Jung Min; Lee, Gyung Min

2016-01-01

Purpose Recent reports showed that plasma N-terminal pro-brain natriuretic peptide (NT-proBNP) could be a useful biomarker of intravenous immunoglobulin (IVIG) unresponsiveness and coronary artery lesion (CAL) development in Kawasaki disease (KD). The levels of these peptides are critically influenced by age; hence, the normal range and upper limits for infants and children are different. We performed an age-adjusted analysis of plasma NT-proBNP level to validate its clinical use in the diagnosis of KD. Methods The data of 131 patients with KD were retrospectively analyzed. The patients were divided into 2 groups—group I (high NT-proBNP group) and group II (normal NT-proBNP group)—comprising patients with NT-proBNP concentrations higher and lower than the 95th percentile of the reference value, respectively. We compared the laboratory data, responsiveness to IVIG, and the risk of CAL in both groups. Results Group I showed significantly higher white blood cell count, absolute neutrophil count, C-reactive protein level, aspartate aminotransferase level, and troponin-I level than group II (P<0.05). The risk of CAL was also significantly higher in group I (odds ratio, 5.78; P=0.012). IVIG unresponsiveness in group I was three times that in group II (odds ratio, 3.35; P= 0.005). Conclusion Age-adjusted analysis of plasma NT-proBNP level could be helpful in predicting IVIG unresponsiveness and risk of CAL development in patients with KD. PMID:27588030

6. Logistic Regression

Grégoire, G.

2014-12-01

The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.

7. Unitary Response Regression Models

ERIC Educational Resources Information Center

Lipovetsky, S.

2007-01-01

The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

8. Linear regression analysis of emissions factors when firing fossil fuels and biofuels in a commercial water-tube boiler

SciTech Connect

Sharon Falcone Miller; Bruce G. Miller

2007-12-15

This paper compares the emissions factors for a suite of liquid biofuels (three animal fats, waste restaurant grease, pressed soybean oil, and a biodiesel produced from soybean oil) and four fossil fuels (i.e., natural gas, No. 2 fuel oil, No. 6 fuel oil, and pulverized coal) in Penn State's commercial water-tube boiler to assess their viability as fuels for green heat applications. The data were broken into two subsets, i.e., fossil fuels and biofuels. The regression model for the liquid biofuels (as a subset) did not perform well for all of the gases. In addition, the coefficient in the models showed the EPA method underestimating CO and NOx emissions. No relation could be studied for SO{sub 2} for the liquid biofuels as they contain no sulfur; however, the model showed a good relationship between the two methods for SO{sub 2} in the fossil fuels. AP-42 emissions factors for the fossil fuels were also compared to the mass balance emissions factors and EPA CFR Title 40 emissions factors. Overall, the AP-42 emissions factors for the fossil fuels did not compare well with the mass balance emissions factors or the EPA CFR Title 40 emissions factors. Regression analysis of the AP-42, EPA, and mass balance emissions factors for the fossil fuels showed a significant relationship only for CO{sub 2} and SO{sub 2}. However, the regression models underestimate the SO{sub 2} emissions by 33%. These tests illustrate the importance in performing material balances around boilers to obtain the most accurate emissions levels, especially when dealing with biofuels. The EPA emissions factors were very good at predicting the mass balance emissions factors for the fossil fuels and to a lesser degree the biofuels. While the AP-42 emissions factors and EPA CFR Title 40 emissions factors are easier to perform, especially in large, full-scale systems, this study illustrated the shortcomings of estimation techniques. 23 refs., 3 figs., 8 tabs.

9. Kernel Continuum Regression.

PubMed

Lee, Myung Hee; Liu, Yufeng

2013-12-01

The continuum regression technique provides an appealing regression framework connecting ordinary least squares, partial least squares and principal component regression in one family. It offers some insight on the underlying regression model for a given application. Moreover, it helps to provide deep understanding of various regression techniques. Despite the useful framework, however, the current development on continuum regression is only for linear regression. In many applications, nonlinear regression is necessary. The extension of continuum regression from linear models to nonlinear models using kernel learning is considered. The proposed kernel continuum regression technique is quite general and can handle very flexible regression model estimation. An efficient algorithm is developed for fast implementation. Numerical examples have demonstrated the usefulness of the proposed technique. PMID:24058224

10. Prediction of acute in vivo toxicity of some amine and amide drugs to rats by multiple linear regression, partial least squares and an artificial neural network.

PubMed

2007-09-01

The oral acute in vivo toxicity of 32 amine and amide drugs was related to their structural-dependent properties. Genetic algorithm-partial least-squares and stepwise variable selection was applied to select of meaningful descriptors. Multiple linear regression (MLR), artificial neural network (ANN) and partial least square (PLS) models were created with selected descriptors. The predictive ability of all three models was evaluated and compared on a set of five drugs, which were not used in modeling steps. Average errors of 0.168, 0.169 and 0.259 were obtained for MLR, ANN and PLS, respectively. PMID:17878584

11. Intertumor linkage of age-adjusted incidence rate in 15 human neoplasias of both sexes.

PubMed

Kodama, M; Kodama, T; Murakami, M; Yokochi, T

2000-01-01

We report here that the application of the least square method of Gauss to the log-transformed age-adjusted incidence rate changes in time and space, as tested with either the male-female or the female-male tumor pairs for each of 15 tumor entities, has revealed the presence of intertumor linkage that was conditioning the changes of two cancer risk parameters to let them fit to the equilibrium model with close resemblance to the chemical equilibrium model. The dissimilarity of the cancer risk equilibrium model to the chemical equilibrium model--topological dissociation between the equilibrium model of centripetal force (r = -1.000) and that of centrifugal force (r = +1.000)--was discussed in the light of the concept of the oncogene activation-tumor suppressor gene inactivation. The proposed network hypothesis of human neoplasia found supporting evidence in the corresponding changes of the statistical features of human neoplasias with and without sex discrimination of cancer risk. PMID:10836207

12. Application of least squares support vector regression and linear multiple regression for modeling removal of methyl orange onto tin oxide nanoparticles loaded on activated carbon and activated carbon prepared from Pistacia atlantica wood.

PubMed

Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar

2016-01-01

Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively. PMID:26414425

13. Verifying the performance of artificial neural network and multiple linear regression in predicting the mean seasonal municipal solid waste generation rate: A case study of Fars province, Iran.

PubMed

2016-02-01

Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate. PMID:26482809

14. [Body mass index and its relationship to nutritional and socioeconomic variables: a linear regression approach to a Brazilian adult sub-population].

PubMed

Leite de Vasconcellos, M T; Portela, M C

2001-01-01

This paper focuses on the relationship between body mass index (BMI) and family energy intake, occupational energy expenditure, per capita family expenditure, sex, age, and left arm circumference for a group of Brazilian adults randomly selected among those interviewed for a survey on food consumption and family budgets, called the National Family Expenditure Survey. The authors discuss linear regression methodological issues related to treatment of outliers and influential cases, multicollinearity, model specification, heteroscedasticity, as well as the use of two-level variables derived from samples with complex design. The results indicate that the model is not affected by outliers and that there are no significant specification errors. They also show a significant linear relationship between BMI and the variables listed above. Although the hypothesis tests indicate significant heteroscedasticity, its corrections did not significantly change the model's parameters, probably due to the sample size (14,000 adults), making hypothesis tests more rigorous than desired. PMID:11784903

15. Modeling the proportion of cut slopes rock on forest roads using artificial neural network and ordinal linear regression.

PubMed

Babapour, R; Naghdi, R; Ghajar, I; Ghodsi, R

2015-07-01

Rock proportion of subsoil directly influences the cost of embankment in forest road construction. Therefore, developing a reliable framework for rock ratio estimation prior to the road planning could lead to more light excavation and less cost operations. Prediction of rock proportion was subjected to statistical analyses using the application of Artificial Neural Network (ANN) in MATLAB and five link functions of ordinal logistic regression (OLR) according to the rock type and terrain slope properties. In addition to bed rock and slope maps, more than 100 sample data of rock proportion were collected, observed by geologists, from any available bed rock of every slope class. Four predictive models were developed for rock proportion, employing independent variables and applying both the selected probit link function of OLR and Layer Recurrent and Feed forward back propagation networks of Neural Networks. In ANN, different numbers of neurons are considered for the hidden layer(s). Goodness of the fit measures distinguished that ANN models produced better results than OLR with R (2) = 0.72 and Root Mean Square Error = 0.42. Furthermore, in order to show the applicability of the proposed approach, and to illustrate the variability of rock proportion resulted from the model application, the optimum models were applied to a mountainous forest in where forest road network had been constructed in the past. PMID:26092244

16. Tightrope Walking: Using Predictors of 25(OH)D Concentration Based on Multivariable Linear Regression to Infer Associations with Health Risks

PubMed Central

Ding, Ning; Dear, Keith; Guo, Shuyu; Xiang, Fan; Lucas, Robyn

2015-01-01

The debate on the causal association between vitamin D status, measured as serum concentration of 25-hydroxyvitamin D (25[OH]D), and various health outcomes warrants investigation in large-scale health surveys. Measuring the 25(OH)D concentration for each participant is not always feasible, because of the logistics of blood collection and the costs of vitamin D testing. To address this problem, past research has used predicted 25(OH)D concentration, based on multivariable linear regression, as a proxy for unmeasured vitamin D status. We restate this approach in a mathematical framework, to deduce its possible pitfalls. Monte Carlo simulation and real data from the National Health and Nutrition Examination Survey 2005–06 are used to confirm the deductions. The results indicate that variables that are used in the prediction model (for 25[OH]D concentration) but not in the model for the health outcome (called instrumental variables), play an essential role in the identification of an effect. Such variables should be unrelated to the health outcome other than through vitamin D; otherwise the estimate of interest will be biased. The approach of predicted 25(OH)D concentration derived from multivariable linear regression may be valid. However, careful verification that the instrumental variables are unrelated to the health outcome is required. PMID:26017695

17. Use of a non-linear spline regression to model time-varying fluctuations in mammary-secretion element concentrations of periparturient mares in Michigan, USA.

PubMed

Lloyd, J W; Rook, J S; Braselton, E; Shea, M E

2000-02-01

A study was designed to model the fluctuations of nine specific element concentrations in mammary secretions from periparturient mares over time. During the 1992 foaling season, serial samples of mammary secretions were collected from all 18 pregnant Arabian mares at the Michigan State University equine teaching and research center. Non-linear regression techniques were used to model the relationship between element concentration in mammary secretions and days from foaling (which connected two separate sigmoid curves with a spline function); indicator variables were included for mare and mare parity. Element concentrations in mammary secretions varied significantly during the periparturient period in mares. Both time trends and individual variability explained a significant portion of the variation in these element concentrations. Multiparous mares had lower concentrations of K and Zn, but higher concentrations of Na. Substantial serial and spatial correlation were detected in spite of modeling efforts to avoid the problem. As a result, p-values obtained for parameter estimates were likely biased toward zero. Nonetheless, results of this analysis indicate that monitoring changes in mammary-secretion element concentrations might reasonably be used as a predictor of impending parturition in the mare. In addition, these results suggest that element concentrations warrant attention in the development of neonatal milk-replacement therapies. This study demonstrates that non-linear regression can be used successfully to model time-series data in animal-health management. This approach should be considered by investigators facing similar analytical challenges. PMID:10782599

18. Monte Carlo simulation of parameter confidence intervals for non-linear regression analysis of biological data using Microsoft Excel.

PubMed

Lambert, Ronald J W; Mytilinaios, Ioannis; Maitland, Luke; Brown, Angus M

2012-08-01

This study describes a method to obtain parameter confidence intervals from the fitting of non-linear functions to experimental data, using the SOLVER and Analysis ToolPaK Add-In of the Microsoft Excel spreadsheet. Previously we have shown that Excel can fit complex multiple functions to biological data, obtaining values equivalent to those returned by more specialized statistical or mathematical software. However, a disadvantage of using the Excel method was the inability to return confidence intervals for the computed parameters or the correlations between them. Using a simple Monte-Carlo procedure within the Excel spreadsheet (without recourse to programming), SOLVER can provide parameter estimates (up to 200 at a time) for multiple 'virtual' data sets, from which the required confidence intervals and correlation coefficients can be obtained. The general utility of the method is exemplified by applying it to the analysis of the growth of Listeria monocytogenes, the growth inhibition of Pseudomonas aeruginosa by chlorhexidine and the further analysis of the electrophysiological data from the compound action potential of the rodent optic nerve. PMID:21764476

19. Internal correction of spectral interferences and mass bias for selenium metabolism studies using enriched stable isotopes in combination with multiple linear regression.

PubMed

Lunøe, Kristoffer; Martínez-Sierra, Justo Giner; Gammelgaard, Bente; Alonso, J Ignacio García

2012-03-01

The analytical methodology for the in vivo study of selenium metabolism using two enriched selenium isotopes has been modified, allowing for the internal correction of spectral interferences and mass bias both for total selenium and speciation analysis. The method is based on the combination of an already described dual-isotope procedure with a new data treatment strategy based on multiple linear regression. A metabolic enriched isotope ((77)Se) is given orally to the test subject and a second isotope ((74)Se) is employed for quantification. In our approach, all possible polyatomic interferences occurring in the measurement of the isotope composition of selenium by collision cell quadrupole ICP-MS are taken into account and their relative contribution calculated by multiple linear regression after minimisation of the residuals. As a result, all spectral interferences and mass bias are corrected internally allowing the fast and independent quantification of natural abundance selenium ((nat)Se) and enriched (77)Se. In this sense, the calculation of the tracer/tracee ratio in each sample is straightforward. The method has been applied to study the time-related tissue incorporation of (77)Se in male Wistar rats while maintaining the (nat)Se steady-state conditions. Additionally, metabolically relevant information such as selenoprotein synthesis and selenium elimination in urine could be studied using the proposed methodology. In this case, serum proteins were separated by affinity chromatography while reverse phase was employed for urine metabolites. In both cases, (74)Se was used as a post-column isotope dilution spike. The application of multiple linear regression to the whole chromatogram allowed us to calculate the contribution of bromine hydride, selenium hydride, argon polyatomics and mass bias on the observed selenium isotope patterns. By minimising the square sum of residuals for the whole chromatogram, internal correction of spectral interferences and mass

20. Fundamental Analysis of the Linear Multiple Regression Technique for Quantification of Water Quality Parameters from Remote Sensing Data. Ph.D. Thesis - Old Dominion Univ.

NASA Technical Reports Server (NTRS)

Whitlock, C. H., III

1977-01-01

Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.

1. A study on the reconstitution of daily PM10 and PM2.5 levels in Paris with a multivariate linear regression model

Dimitriou, Konstantinos; Kassomenos, Pavlos

2014-12-01

The amount of time air spends over a region is linearly related to the region's contribution in PM. The residence time of air masses over emission sources was the main criterion for the division in 15 regions-origins. Daily PM concentrations in Paris (France), were reconstituted by multiplying the air mass residence time for each-one of the 15 regions by a regression coefficient (Bk) expressing the ability of each region to enrich the daily PM concentrations. The comparison between observed and predicted values gave satisfactory results. Local regions contributed cumulatively more than 50% of PM2.5 and PM10 in an average daily basis, whereas the residing areas of air parcels were particularly located around the city. Due to the scarceness of eastern circulation, continental airflows were associated with few episodes of extreme aerosol contributions, whereas peak air mass residence time values were isolated above Germany.

2. A multivariate linear regression model for predicting children's blood lead levels based on soil lead levels: A study at four Superfund sites

SciTech Connect

Lewin, M.D.; Sarasua, S.; Jones, P.A. . Div. of Health Studies)

1999-07-01

3. Partial Least-Squares and Linear Support Vector Regression Chemometric Methods for Simultaneous Determination of Amoxicillin Trihydrate and Dicloxacillin Sodium in the Presence of Their Common Impurity.

PubMed

Naguib, Ibrahim A; Abdelaleem, Eglal A; Zaazaa, Hala E; Hussein, Essraa A

2016-07-01

Two multivariate chemometric models, namely, partial least-squares regression (PLSR) and linear support vector regression (SVR), are presented for the analysis of amoxicillin trihydrate and dicloxacillin sodium in the presence of their common impurity (6-aminopenicillanic acid) in raw materials and in pharmaceutical dosage form via handling UV spectral data and making a modest comparison between the two models, highlighting the advantages and limitations of each. For optimum analysis, a three-factor, four-level experimental design was established, resulting in a training set of 16 mixtures containing different ratios of interfering species. To validate the prediction ability of the suggested models, an independent test set consisting of eight mixtures was used. The presented results show the ability of the two proposed models to determine the two drugs simultaneously in the presence of small levels of the common impurity with high accuracy and selectivity. The analysis results of the dosage form were statistically compared to a reported HPLC method, with no significant difference regarding accuracy and precision, indicating the ability of the suggested multivariate calibration models to be reliable and suitable for routine analysis of the drug product. Compared to the PLSR model, the SVR model gives more accurate results with a lower prediction error, as well as high generalization ability; however, the PLSR model is easy to handle and fast to optimize. PMID:27305461

4. Retrieval of aerosol optical depth from surface solar radiation measurements using machine learning algorithms, non-linear regression and a radiative transfer-based look-up table

Huttunen, Jani; Kokkola, Harri; Mielonen, Tero; Esa Juhani Mononen, Mika; Lipponen, Antti; Reunanen, Juha; Vilhelm Lindfors, Anders; Mikkonen, Santtu; Erkki Juhani Lehtinen, Kari; Kouremeti, Natalia; Bais, Alkiviadis; Niska, Harri; Arola, Antti

2016-07-01

In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during

5. Orthogonal Regression: A Teaching Perspective

ERIC Educational Resources Information Center

Carr, James R.

2012-01-01

A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…

6. Robust Regression.

PubMed

Huang, Dong; Cabral, Ricardo; De la Torre, Fernando

2016-02-01

Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR. PMID:26761740

7. Estimating Dbh of Trees Employing Multiple Linear Regression of the best Lidar-Derived Parameter Combination Automated in Python in a Natural Broadleaf Forest in the Philippines

Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.

2016-06-01

Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).

8. Non-parametric linear regression of discrete Fourier transform convoluted chromatographic peak responses under non-ideal conditions of internal standard method.

PubMed

Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Fahmy, Ossama T; Ragab, Marwa A A

2010-11-15

This manuscript discusses the application of chemometrics to the handling of HPLC response data using the internal standard method (ISM). This was performed on a model mixture containing terbutaline sulphate, guaiphenesin, bromhexine HCl, sodium benzoate and propylparaben as an internal standard. Derivative treatment of chromatographic response data of analyte and internal standard was followed by convolution of the resulting derivative curves using 8-points sin x(i) polynomials (discrete Fourier functions). The response of each analyte signal, its corresponding derivative and convoluted derivative data were divided by that of the internal standard to obtain the corresponding ratio data. This was found beneficial in eliminating different types of interferences. It was successfully applied to handle some of the most common chromatographic problems and non-ideal conditions, namely: overlapping chromatographic peaks and very low analyte concentrations. For example, a significant change in the correlation coefficient of sodium benzoate, in case of overlapping peaks, went from 0.9975 to 0.9998 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. Also a significant improvement in the precision and accuracy for the determination of synthetic mixtures and dosage forms in non-ideal cases was achieved. For example, in the case of overlapping peaks guaiphenesin mean recovery% and RSD% went from 91.57, 9.83 to 100.04, 0.78 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. This work also compares the application of Theil's method, a non-parametric regression method, in handling the response ratio data, with the least squares parametric regression method, which is considered the de facto standard method used for regression. Theil's method was found to be superior to the method of least squares as it assumes that errors could occur in both x- and y-directions and

9. Correlation of results obtained by in-vivo optical spectroscopy with measured blood oxygen saturation using a positive linear regression fit

McCormick, Patrick W.; Lewis, Gary D.; Dujovny, Manuel; Ausman, James I.; Stewart, Mick; Widman, Ronald A.

1992-05-01

Near infrared light generated by specialized instrumentation was passed through artificially oxygenated human blood during simultaneous sampling by a co-oximeter. Characteristic absorption spectra were analyzed to calculate the ratio of oxygenated to reduced hemoglobin. A positive linear regression fit between diffuse transmission oximetry and measured blood oxygenation over the range 23% to 99% (r2 equals .98, p < .001) was noted. The same technology was used to pass two channels of light through the scalp of brain-injured patients with prolonged, decreased level of consciousness in a tertiary care neuroscience ICU. Transmission data were collected with gross superficial-to-deep spatial resolution. Saturation calculation based on the deep signal was observed in the patient over time. The procedure was able to be performed clinically without difficulty; rSO2 values recorded continuously demonstrate the usefulness of the technique. Using the same instrumentation, arterial input and cerebral response functions, generated by IV tracer bolus, were deconvoluted to measure mean cerebral transit time. Date collected over time provided a sensitive index of changes in cerebral blood flow as a result of therapeutic maneuvers.

10. Modelling the Relationship Between Land Surface Temperature and Landscape Patterns of Land Use Land Cover Classification Using Multi Linear Regression Models

Bernales, A. M.; Antolihao, J. A.; Samonte, C.; Campomanes, F.; Rojas, R. J.; dela Serna, A. M.; Silapan, J.

2016-06-01

The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC) and land surface temperature (LST). Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric "Effective mesh size" was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas) and looking for common predictors between LSTs of these two different farming periods.

11. Bisphenol-A exposures and behavioural aberrations: median and linear spline and meta-regression analyses of 12 toxicity studies in rodents.

PubMed

Peluso, Marco E M; Munnia, Armelle; Ceppi, Marcello

2014-11-01

Exposures to bisphenol-A, a weak estrogenic chemical, largely used for the production of plastic containers, can affect the rodent behaviour. Thus, we examined the relationships between bisphenol-A and the anxiety-like behaviour, spatial skills, and aggressiveness, in 12 toxicity studies of rodent offspring from females orally exposed to bisphenol-A, while pregnant and/or lactating, by median and linear splines analyses. Subsequently, the meta-regression analysis was applied to quantify the behavioural changes. U-shaped, inverted U-shaped and J-shaped dose-response curves were found to describe the relationships between bisphenol-A with the behavioural outcomes. The occurrence of anxiogenic-like effects and spatial skill changes displayed U-shaped and inverted U-shaped curves, respectively, providing examples of effects that are observed at low-doses. Conversely, a J-dose-response relationship was observed for aggressiveness. When the proportion of rodents expressing certain traits or the time that they employed to manifest an attitude was analysed, the meta-regression indicated that a borderline significant increment of anxiogenic-like effects was present at low-doses regardless of sexes (β)=-0.8%, 95% C.I. -1.7/0.1, P=0.076, at ≤120 μg bisphenol-A. Whereas, only bisphenol-A-males exhibited a significant inhibition of spatial skills (β)=0.7%, 95% C.I. 0.2/1.2, P=0.004, at ≤100 μg/day. A significant increment of aggressiveness was observed in both the sexes (β)=67.9,C.I. 3.4, 172.5, P=0.038, at >4.0 μg. Then, bisphenol-A treatments significantly abrogated spatial learning and ability in males (P<0.001 vs. females). Overall, our study showed that developmental exposures to low-doses of bisphenol-A, e.g. ≤120 μg/day, were associated to behavioural aberrations in offspring. PMID:25242006

12. Ridge Regression: A Regression Procedure for Analyzing Correlated Independent Variables.

ERIC Educational Resources Information Center

Rakow, Ernest A.

Ridge regression is presented as an analytic technique to be used when predictor variables in a multiple linear regression situation are highly correlated, a situation which may result in unstable regression coefficients and difficulties in interpretation. Ridge regression avoids the problem of selection of variables that may occur in stepwise…

13. Taking into account latency, amplitude, and morphology: improved estimation of single-trial ERPs by wavelet filtering and multiple linear regression

PubMed Central

Hu, L.; Liang, M.; Mouraux, A.; Wise, R. G.; Hu, Y.

2011-01-01

Across-trial averaging is a widely used approach to enhance the signal-to-noise ratio (SNR) of event-related potentials (ERPs). However, across-trial variability of ERP latency and amplitude may contain physiologically relevant information that is lost by across-trial averaging. Hence, we aimed to develop a novel method that uses 1) wavelet filtering (WF) to enhance the SNR of ERPs and 2) a multiple linear regression with a dispersion term (MLRd) that takes into account shape distortions to estimate the single-trial latency and amplitude of ERP peaks. Using simulated ERP data sets containing different levels of noise, we provide evidence that, compared with other approaches, the proposed WF+MLRd method yields the most accurate estimate of single-trial ERP features. When applied to a real laser-evoked potential data set, the WF+MLRd approach provides reliable estimation of single-trial latency, amplitude, and morphology of ERPs and thereby allows performing meaningful correlations at single-trial level. We obtained three main findings. First, WF significantly enhances the SNR of single-trial ERPs. Second, MLRd effectively captures and measures the variability in the morphology of single-trial ERPs, thus providing an accurate and unbiased estimate of their peak latency and amplitude. Third, intensity of pain perception significantly correlates with the single-trial estimates of N2 and P2 amplitude. These results indicate that WF+MLRd can be used to explore the dynamics between different ERP features, behavioral variables, and other neuroimaging measures of brain activity, thus providing new insights into the functional significance of the different brain processes underlying the brain responses to sensory stimuli. PMID:21880936

14. Development of multiple linear regression models as predictive tools for fecal indicator concentrations in a stretch of the lower Lahn River, Germany.

PubMed

Herrig, Ilona M; Böer, Simone I; Brennholt, Nicole; Manz, Werner

2015-11-15

Since rivers are typically subject to rapid changes in microbiological water quality, tools are needed to allow timely water quality assessment. A promising approach is the application of predictive models. In our study, we developed multiple linear regression (MLR) models in order to predict the abundance of the fecal indicator organisms Escherichia coli (EC), intestinal enterococci (IE) and somatic coliphages (SC) in the Lahn River, Germany. The models were developed on the basis of an extensive set of environmental parameters collected during a 12-months monitoring period. Two models were developed for each type of indicator: 1) an extended model including the maximum number of variables significantly explaining variations in indicator abundance and 2) a simplified model reduced to the three most influential explanatory variables, thus obtaining a model which is less resource-intensive with regard to required data. Both approaches have the ability to model multiple sites within one river stretch. The three most important predictive variables in the optimized models for the bacterial indicators were NH4-N, turbidity and global solar irradiance, whereas chlorophyll a content, discharge and NH4-N were reliable model variables for somatic coliphages. Depending on indicator type, the extended mode models also included the additional variables rainfall, O2 content, pH and chlorophyll a. The extended mode models could explain 69% (EC), 74% (IE) and 72% (SC) of the observed variance in fecal indicator concentrations. The optimized models explained the observed variance in fecal indicator concentrations to 65% (EC), 70% (IE) and 68% (SC). Site-specific efficiencies ranged up to 82% (EC) and 81% (IE, SC). Our results suggest that MLR models are a promising tool for a timely water quality assessment in the Lahn area. PMID:26318647

15. Investigation of the relationship between very warm days in Romania and large-scale atmospheric circulation using multiple linear regression approach

Barbu, N.; Cuculeanu, V.; Stefan, S.

2015-08-01

The aim of this study is to investigate the relationship between the frequency of very warm days (TX90p) in Romania and large-scale atmospheric circulation for winter (December-February) and summer (June-August) between 1962 and 2010. In order to achieve this, two catalogues from COST733Action were used to derive daily circulation types. Seasonal occurrence frequencies of the circulation types were calculated and have been utilized as predictors within the multiple linear regression model (MLRM) for the estimation of winter and summer TX90p values for 85 synoptic stations covering the entire Romania. A forward selection procedure has been utilized to find adequate predictor combinations and those predictor combinations were tested for collinearity. The performance of the MLRMs has been quantified based on the explained variance. Furthermore, the leave-one-out cross-validation procedure was applied and the root-mean-squared error skill score was calculated at station level in order to obtain reliable evidence of MLRM robustness. From this analysis, it can be stated that the MLRM performance is higher in winter compared to summer. This is due to the annual cycle of incoming insolation and to the local factors such as orography and surface albedo variations. The MLRM performances exhibit distinct variations between regions with high performance in wintertime for the eastern and southern part of the country and in summertime for the western part of the country. One can conclude that the MLRM generally captures quite well the TX90p variability and reveals the potential for statistical downscaling of TX90p values based on circulation types.

16. Can Linear Regression Modeling Help Clinicians in the Interpretation of Genotypic Resistance Data? An Application to Derive a Lopinavir-Score

PubMed Central

Cozzi-Lepri, Alessandro; Prosperi, Mattia C. F.; Kjær, Jesper; Dunn, David; Paredes, Roger; Sabin, Caroline A.; Lundgren, Jens D.; Phillips, Andrew N.; Pillay, Deenan

2011-01-01

Background The question of whether a score for a specific antiretroviral (e.g. lopinavir/r in this analysis) that improves prediction of viral load response given by existing expert-based interpretation systems (IS) could be derived from analyzing the correlation between genotypic data and virological response using statistical methods remains largely unanswered. Methods and Findings We used the data of the patients from the UK Collaborative HIV Cohort (UK CHIC) Study for whom genotypic data were stored in the UK HIV Drug Resistance Database (UK HDRD) to construct a training/validation dataset of treatment change episodes (TCE). We used the average square error (ASE) on a 10-fold cross-validation and on a test dataset (the EuroSIDA TCE database) to compare the performance of a newly derived lopinavir/r score with that of the 3 most widely used expert-based interpretation rules (ANRS, HIVDB and Rega). Our analysis identified mutations V82A, I54V, K20I and I62V, which were associated with reduced viral response and mutations I15V and V91S which determined lopinavir/r hypersensitivity. All models performed equally well (ASE on test ranging between 1.1 and 1.3, p = 0.34). Conclusions We fully explored the potential of linear regression to construct a simple predictive model for lopinavir/r-based TCE. Although, the performance of our proposed score was similar to that of already existing IS, previously unrecognized lopinavir/r-associated mutations were identified. The analysis illustrates an approach of validation of expert-based IS that could be used in the future for other antiretrovirals and in other settings outside HIV research. PMID:22110581

17. Improved Regression Calibration

ERIC Educational Resources Information Center

Skrondal, Anders; Kuha, Jouni

2012-01-01

The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration…

18. Prediction of spatial soil property information from ancillary sensor data using ordinary linear regression: Model derivations, residual assumptions and model validation tests

Technology Transfer Automated Retrieval System (TEKTRAN)

Geospatial measurements of ancillary sensor data, such as bulk soil electrical conductivity or remotely sensed imagery data, are commonly used to characterize spatial variation in soil or crop properties. Geostatistical techniques like kriging with external drift or regression kriging are often use...

19. A method for the selection of a functional form for a thermodynamic equation of state using weighted linear least squares stepwise regression

NASA Technical Reports Server (NTRS)

Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.

1976-01-01

A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.

20. A Free-Knot Spline Modeling Framework for Piecewise Linear Logistic Regression in Complex Samples with Body Mass Index and Mortality as an Example

PubMed Central

Keith, Scott W.; Allison, David B.

2014-01-01

This paper details the design, evaluation, and implementation of a framework for detecting and modeling non-linearity between a binary outcome and a continuous predictor variable adjusted for covariates in complex samples. The framework provides familiar-looking parameterizations of output in terms of linear slope coefficients and odds ratios. Estimation methods focus on maximum likelihood optimization of piecewise linear free-knot splines formulated as B-splines. Correctly specifying the optimal number and positions of the knots improves the model, but is marked by computational intensity and numerical instability. Our inference methods utilize both parametric and non-parametric bootstrapping. Unlike other non-linear modeling packages, this framework is designed to incorporate multistage survey sample designs common to nationally representative datasets. We illustrate the approach and evaluate its performance in specifying the correct number of knots under various conditions with an example using body mass index (BMI, kg/m2) and the complex multistage sampling design from the Third National Health and Nutrition Examination Survey to simulate binary mortality outcomes data having realistic non-linear sample-weighted risk associations with BMI. BMI and mortality data provide a particularly apt example and area of application since BMI is commonly recorded in large health surveys with complex designs, often categorized for modeling, and non-linearly related to mortality. When complex sample design considerations were ignored, our method was generally similar to or more accurate than two common model selection procedures, Schwarz’s Bayesian Information Criterion (BIC) and Akaike’s Information Criterion (AIC), in terms of correctly selecting the correct number of knots. Our approach provided accurate knot selections when complex sampling weights were incorporated, while AIC and BIC were not effective under these conditions. PMID:25610831

1. Morse-Smale Regression

PubMed Central

Gerber, Samuel; Rübel, Oliver; Bremer, Peer-Timo; Pascucci, Valerio; Whitaker, Ross T.

2012-01-01

This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduce a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse-Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this paper introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to over-fitting. The Morse-Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse-Smale regression. Supplementary materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse-Smale complex approximation and additional tables for the climate-simulation study. PMID:23687424

2. Morse–Smale Regression

SciTech Connect

Gerber, Samuel; Rubel, Oliver; Bremer, Peer -Timo; Pascucci, Valerio; Whitaker, Ross T.

2012-01-19

This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.

3. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

USGS Publications Warehouse

Granato, Gregory E.

2006-01-01

The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and

4. Changes in Age-Adjusted Mortality Rates and Disparities for Rural Physician Shortage Areas Staffed by the National Health Service Corps: 1984-1998

ERIC Educational Resources Information Center

Pathman, Donald E.; Fryer, George E.; Green, Larry A.; Phillips, Robert L.

2005-01-01

Objective: This study assesses whether the National Health Service Corps's legislated goals to see health improve and health disparities lessen are being met in rural health professional shortage areas for a key population health indicator: age-adjusted mortality. Methods: In a descriptive study using a pre-post design with comparison groups, the…

5. Changes in Age-Adjusted Mortality Rates and Disparities for Rural Physician Shortage Areas Staffed by the National Health Service Corps: 1984-1998

ERIC Educational Resources Information Center

Pathman, Donald E.; Fryer, George E.; Green, Larry A.; Phillips, Robert L.

2005-01-01

This study assesses whether the National Health Service Corps's legislated goals to see health improve and health disparities lessen are being met in rural health professional shortage areas for a key population health indicator: age-adjusted mortality. In a descriptive study using a pre-post design with comparison groups, the authors calculated…

6. Novel approaches to the calculation and comparison of thermoregulatory parameters: Non-linear regression of metabolic rate and evaporative water loss in Australian rodents.

PubMed

Tomlinson, Sean

2016-04-01

The calculation and comparison of physiological characteristics of thermoregulation has provided insight into patterns of ecology and evolution for over half a century. Thermoregulation has typically been explored using linear techniques; I explore the application of non-linear scaling to more accurately calculate and compare characteristics and thresholds of thermoregulation, including the basal metabolic rate (BMR), peak metabolic rate (PMR) and the lower (Tlc) and upper (Tuc) critical limits to the thermo-neutral zone (TNZ) for Australian rodents. An exponentially-modified logistic function accurately characterised the response of metabolic rate to ambient temperature, while evaporative water loss was accurately characterised by a Michaelis-Menten function. When these functions were used to resolve unique parameters for the nine species studied here, the estimates of BMR and TNZ were consistent with the previously published estimates. The approach resolved differences in rates of metabolism and water loss between subfamilies of Australian rodents that haven't been quantified before. I suggest that non-linear scaling is not only more effective than the established segmented linear techniques, but also is more objective. This approach may allow broader and more flexible comparison of characteristics of thermoregulation, but it needs testing with a broader array of taxa than those used here. PMID:27033039

7. Linear support vector regression and partial least squares chemometric models for determination of Hydrochlorothiazide and Benazepril hydrochloride in presence of related impurities: A comparative study

Naguib, Ibrahim A.; Abdelaleem, Eglal A.; Draz, Mohammed E.; Zaazaa, Hala E.

2014-09-01

Partial least squares regression (PLSR) and support vector regression (SVR) are two popular chemometric models that are being subjected to a comparative study in the presented work. The comparison shows their characteristics via applying them to analyze Hydrochlorothiazide (HCZ) and Benazepril hydrochloride (BZ) in presence of HCZ impurities; Chlorothiazide (CT) and Salamide (DSA) as a case study. The analysis results prove to be valid for analysis of the two active ingredients in raw materials and pharmaceutical dosage form through handling UV spectral data in range (220-350 nm). For proper analysis a 4 factor 4 level experimental design was established resulting in a training set consisting of 16 mixtures containing different ratios of interfering species. An independent test set consisting of 8 mixtures was used to validate the prediction ability of the suggested models. The results presented indicate the ability of mentioned multivariate calibration models to analyze HCZ and BZ in presence of HCZ impurities CT and DSA with high selectivity and accuracy of mean percentage recoveries of (101.01 ± 0.80) and (100.01 ± 0.87) for HCZ and BZ respectively using PLSR model and of (99.78 ± 0.80) and (99.85 ± 1.08) for HCZ and BZ respectively using SVR model. The analysis results of the dosage form were statistically compared to the reference HPLC method with no significant differences regarding accuracy and precision. SVR model gives more accurate results compared to PLSR model and show high generalization ability, however, PLSR still keeps the advantage of being fast to optimize and implement.

8. Examining Non-Linear Associations between Accelerometer-Measured Physical Activity, Sedentary Behavior, and All-Cause Mortality Using Segmented Cox Regression

PubMed Central

Lee, Paul H.

2016-01-01

Healthy adults are advised to perform at least 150 min of moderate-intensity physical activity weekly, but this advice is based on studies using self-reports of questionable validity. This study examined the dose-response relationship of accelerometer-measured physical activity and sedentary behaviors on all-cause mortality using segmented Cox regression to empirically determine the break-points of the dose-response relationship. Data from 7006 adult participants aged 18 or above in the National Health and Nutrition Examination Survey waves 2003–2004 and 2005–2006 were included in the analysis and linked with death certificate data using a probabilistic matching approach in the National Death Index through December 31, 2011. Physical activity and sedentary behavior were measured using ActiGraph model 7164 accelerometer over the right hip for 7 consecutive days. Each minute with accelerometer count <100; 1952–5724; and ≥5725 were classified as sedentary, moderate-intensity physical activity, and vigorous-intensity physical activity, respectively. Segmented Cox regression was used to estimate the hazard ratio (HR) of time spent in sedentary behaviors, moderate-intensity physical activity, and vigorous-intensity physical activity and all-cause mortality, adjusted for demographic characteristics, health behaviors, and health conditions. Data were analyzed in 2016. During 47,119 person-year of follow-up, 608 deaths occurred. Each additional hour per day of sedentary behaviors was associated with a HR of 1.15 (95% CI 1.01, 1.31) among participants who spend at least 10.9 h per day on sedentary behaviors, and each additional minute per day spent on moderate-intensity physical activity was associated with a HR of 0.94 (95% CI 0.91, 0.96) among participants with daily moderate-intensity physical activity ≤14.1 min. Associations of moderate physical activity and sedentary behaviors on all-cause mortality were independent of each other. To conclude, evidence from

9. Oncogene activation and tumor suppressor gene inactivation find their sites of expression in the changes in time and space of the age-adjusted cancer incidence rate.

PubMed

Kodama, M; Kodama, T; Murakami, M

2000-01-01

The purpose of the present investigation is to elucidate the relation between the distribution pattern of the age-adjusted incidence rate (AAIR) changes in time and space of 15 tumors of bothe sexes and the locations of centers of centripetal-(oncogene type) and centrifugal-(tumoe suppressor gene type) forces. The fitness of the observed log AAIR data sets to the oncogene type- and the tumor suppressor gene type-equilibrium models and the locations of 2 force centers were calculated by applying the least square method of Gauss to log AAIR pair data series with and without topological data manipulations, which are so designed as to let log AAIR pair data series fit to 2 variant (x, y) frameworks, the Rect-coordinates and the Para-coordinates. The 2 variant (x, y) coordinates are defined each as an (x, y) framework with its X axis crossed at a right angle to the regression line of the original log AAIR data (the Rect-coordinates) and as another framework with its X axis run in parallel with the regression line of the original log AAIR pair data series (the Para-coordinates). The fitness test of log AAIR data series to either the oncogene activation type equilibrium model (r = -1.000) or the tumor suppressor gene inactivation type (r = 1.000) was conducted for each of the male-female type pair data and the female-male type data, for each of log AAIR changes in space and log AAIR changes in time, and for each of the 3 (x, y) frameworks in a given neoplasia of both sexes. The results obtained are given as follows: 1) The positivity rates of the fitness test to the oncogene type equilibrium model and the tumor suppressor gene type model were each 63.3% and 56.7% with the log AAIR changes in space, and 73.3% and 73.3% with log AAIR changes in time, as tested in 15 human neoplasias of both sexes. 2) Evidence was presented to indicate that the clearance of oncogene activation and tumor suppressor gene inactivation is the sine qua non premise of carciniogenesis. 3) The r

10. The Health Extension Program and Its Association with Change in Utilization of Selected Maternal Health Services in Tigray Region, Ethiopia: A Segmented Linear Regression Analysis

PubMed Central

Gebrehiwot, Tesfay Gebregzabher; San Sebastian, Miguel; Edin, Kerstin; Goicolea, Isabel

2015-01-01

Background In 2003, the Ethiopian Ministry of Health established the Health Extension Program (HEP), with the goal of improving access to health care and health promotion activities in rural areas of the country. This paper aims to assess the association of the HEP with improved utilization of maternal health services in Northern Ethiopia using institution-based retrospective data. Methods Average quarterly total attendances for antenatal care (ANC), delivery care (DC) and post-natal care (PNC) at health posts and health care centres were studied from 2002 to 2012. Regression analysis was applied to two models to assess whether trends were statistically significant. One model was used to estimate the level and trend changes associated with the immediate period of intervention, while changes related to the post-intervention period were estimated by the other. Results The total number of consultations for ANC, DC and PNC increased constantly, particularly after the late-intervention period. Increases were higher for ANC and PNC at health post level and for DC at health centres. A positive statistically significant upward trend was found for DC and PNC in all facilities (p<0.01). The positive trend was also present in ANC at health centres (p = 0.04), but not at health posts. Conclusion Our findings revealed an increase in the use of antenatal, delivery and post-natal care after the introduction of the HEP. We are aware that other factors, that we could not control for, might be explaining that increase. The figures for DC and PNC are however low and more needs to be done in order to increase the access to the health care system as well as the demand for these services by the population. Strengthening of the health information system in the region needs also to be prioritized. PMID:26218074