Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Linearly Adjustable International Portfolios
NASA Astrophysics Data System (ADS)
Fonseca, R. J.; Kuhn, D.; Rustem, B.
2010-09-01
We present an approach to multi-stage international portfolio optimization based on the imposition of a linear structure on the recourse decisions. Multiperiod decision problems are traditionally formulated as stochastic programs. Scenario tree based solutions however can become intractable as the number of stages increases. By restricting the space of decision policies to linear rules, we obtain a conservative tractable approximation to the original problem. Local asset prices and foreign exchange rates are modelled separately, which allows for a direct measure of their impact on the final portfolio value.
Practical Session: Multiple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr33.pdf) and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_5.pdf).
LRGS: Linear Regression by Gibbs Sampling
NASA Astrophysics Data System (ADS)
Mantz, Adam B.
2016-02-01
LRGS (Linear Regression by Gibbs Sampling) implements a Gibbs sampler to solve the problem of multivariate linear regression with uncertainties in all measured quantities and intrinsic scatter. LRGS extends an algorithm by Kelly (2007) that used Gibbs sampling for performing linear regression in fairly general cases in two ways: generalizing the procedure for multiple response variables, and modeling the prior distribution of covariates using a Dirichlet process.
Three-Dimensional Modeling in Linear Regression.
ERIC Educational Resources Information Center
Herman, James D.
Linear regression examines the relationship between one or more independent (predictor) variables and a dependent variable. By using a particular formula, regression determines the weights needed to minimize the error term for a given set of predictors. With one predictor variable, the relationship between the predictor and the dependent variable…
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.
Removing Malmquist bias from linear regressions
NASA Technical Reports Server (NTRS)
Verter, Frances
1993-01-01
Malmquist bias is present in all astronomical surveys where sources are observed above an apparent brightness threshold. Those sources which can be detected at progressively larger distances are progressively more limited to the intrinsically luminous portion of the true distribution. This bias does not distort any of the measurements, but distorts the sample composition. We have developed the first treatment to correct for Malmquist bias in linear regressions of astronomical data. A demonstration of the corrected linear regression that is computed in four steps is presented.
Post-processing through linear regression
NASA Astrophysics Data System (ADS)
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
MLREG, stepwise multiple linear regression program
Carder, J.H.
1981-09-01
This program is written in FORTRAN for an IBM computer and performs multiple linear regressions according to a stepwise procedure. The program transforms and combines old variables into new variables, prints input and transformed data, sums, raw sums or squares, residual sum of squares, means and standard deviations, correlation coefficients, regression results at each step, ANOVA at each step, and predicted response results at each step. This package contains an EXEC used to execute the program,sample input data and output listing, source listing, documentation, and card decks containing the EXEC sample input, and FORTRAN source.
A tutorial on Bayesian Normal linear regression
NASA Astrophysics Data System (ADS)
Klauenberg, Katy; Wübbeler, Gerd; Mickan, Bodo; Harris, Peter; Elster, Clemens
2015-12-01
Regression is a common task in metrology and often applied to calibrate instruments, evaluate inter-laboratory comparisons or determine fundamental constants, for example. Yet, a regression model cannot be uniquely formulated as a measurement function, and consequently the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements are not applicable directly. Bayesian inference, however, is well suited to regression tasks, and has the advantage of accounting for additional a priori information, which typically robustifies analyses. Furthermore, it is anticipated that future revisions of the GUM shall also embrace the Bayesian view. Guidance on Bayesian inference for regression tasks is largely lacking in metrology. For linear regression models with Gaussian measurement errors this tutorial gives explicit guidance. Divided into three steps, the tutorial first illustrates how a priori knowledge, which is available from previous experiments, can be translated into prior distributions from a specific class. These prior distributions have the advantage of yielding analytical, closed form results, thus avoiding the need to apply numerical methods such as Markov Chain Monte Carlo. Secondly, formulas for the posterior results are given, explained and illustrated, and software implementations are provided. In the third step, Bayesian tools are used to assess the assumptions behind the suggested approach. These three steps (prior elicitation, posterior calculation, and robustness to prior uncertainty and model adequacy) are critical to Bayesian inference. The general guidance given here for Normal linear regression tasks is accompanied by a simple, but real-world, metrological example. The calibration of a flow device serves as a running example and illustrates the three steps. It is shown that prior knowledge from previous calibrations of the same sonic nozzle enables robust predictions even for extrapolations.
Genetic Programming Transforms in Linear Regression Situations
NASA Astrophysics Data System (ADS)
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Mental chronometry with simple linear regression.
Chen, J Y
1997-10-01
Typically, mental chronometry is performed by means of introducing an independent variable postulated to affect selectively some stage of a presumed multistage process. However, the effect could be a global one that spreads proportionally over all stages of the process. Currently, there is no method to test this possibility although simple linear regression might serve the purpose. In the present study, the regression approach was tested with tasks (memory scanning and mental rotation) that involved a selective effect and with a task (word superiority effect) that involved a global effect, by the dominant theories. The results indicate (1) the manipulation of the size of a memory set or of angular disparity affects the intercept of the regression function that relates the times for memory scanning with different set sizes or for mental rotation with different angular disparities and (2) the manipulation of context affects the slope of the regression function that relates the times for detecting a target character under word and nonword conditions. These ratify the regression approach as a useful method for doing mental chronometry. PMID:9347535
Multiple linear regression for isotopic measurements
NASA Astrophysics Data System (ADS)
Garcia Alonso, J. I.
2012-04-01
There are two typical applications of isotopic measurements: the detection of natural variations in isotopic systems and the detection man-made variations using enriched isotopes as indicators. For both type of measurements accurate and precise isotope ratio measurements are required. For the so-called non-traditional stable isotopes, multicollector ICP-MS instruments are usually applied. In many cases, chemical separation procedures are required before accurate isotope measurements can be performed. The off-line separation of Rb and Sr or Nd and Sm is the classical procedure employed to eliminate isobaric interferences before multicollector ICP-MS measurement of Sr and Nd isotope ratios. Also, this procedure allows matrix separation for precise and accurate Sr and Nd isotope ratios to be obtained. In our laboratory we have evaluated the separation of Rb-Sr and Nd-Sm isobars by liquid chromatography and on-line multicollector ICP-MS detection. The combination of this chromatographic procedure with multiple linear regression of the raw chromatographic data resulted in Sr and Nd isotope ratios with precisions and accuracies typical of off-line sample preparation procedures. On the other hand, methods for the labelling of individual organisms (such as a given plant, fish or animal) are required for population studies. We have developed a dual isotope labelling procedure which can be unique for a given individual, can be inherited in living organisms and it is stable. The detection of the isotopic signature is based also on multiple linear regression. The labelling of fish and its detection in otoliths by Laser Ablation ICP-MS will be discussed using trout and salmon as examples. As a conclusion, isotope measurement procedures based on multiple linear regression can be a viable alternative in multicollector ICP-MS measurements.
Estimation of adjusted rate differences using additive negative binomial regression.
Donoghoe, Mark W; Marschner, Ian C
2016-08-15
Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation-conditional maximisation-either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27073156
Sparse brain network using penalized linear regression
NASA Astrophysics Data System (ADS)
Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.
2011-03-01
Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.
A Gibbs sampler for multivariate linear regression
NASA Astrophysics Data System (ADS)
Mantz, Adam B.
2016-04-01
Kelly described an efficient algorithm, using Gibbs sampling, for performing linear regression in the fairly general case where non-zero measurement errors exist for both the covariates and response variables, where these measurements may be correlated (for the same data point), where the response variable is affected by intrinsic scatter in addition to measurement error, and where the prior distribution of covariates is modelled by a flexible mixture of Gaussians rather than assumed to be uniform. Here, I extend the Kelly algorithm in two ways. First, the procedure is generalized to the case of multiple response variables. Secondly, I describe how to model the prior distribution of covariates using a Dirichlet process, which can be thought of as a Gaussian mixture where the number of mixture components is learned from the data. I present an example of multivariate regression using the extended algorithm, namely fitting scaling relations of the gas mass, temperature, and luminosity of dynamically relaxed galaxy clusters as a function of their mass and redshift. An implementation of the Gibbs sampler in the R language, called LRGS, is provided.
Suppression Situations in Multiple Linear Regression
ERIC Educational Resources Information Center
Shieh, Gwowen
2006-01-01
This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…
Augmenting Data with Published Results in Bayesian Linear Regression
ERIC Educational Resources Information Center
de Leeuw, Christiaan; Klugkist, Irene
2012-01-01
In most research, linear regression analyses are performed without taking into account published results (i.e., reported summary statistics) of similar previous studies. Although the prior density in Bayesian linear regression could accommodate such prior knowledge, formal models for doing so are absent from the literature. The goal of this…
Who Will Win?: Predicting the Presidential Election Using Linear Regression
ERIC Educational Resources Information Center
Lamb, John H.
2007-01-01
This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
Local Linear Regression for Data with AR Errors
Li, Runze; Li, Yan
2009-01-01
In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set. PMID:20161374
A VBA-based Simulation for Teaching Simple Linear Regression
ERIC Educational Resources Information Center
Jones, Gregory Todd; Hagtvedt, Reidar; Jones, Kari
2004-01-01
In spite of the name, simple linear regression presents a number of conceptual difficulties, particularly for introductory students. This article describes a simulation tool that provides a hands-on method for illuminating the relationship between parameters and sample statistics.
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Learning a Nonnegative Sparse Graph for Linear Regression.
Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung
2015-09-01
Previous graph-based semisupervised learning (G-SSL) methods have the following drawbacks: 1) they usually predefine the graph structure and then use it to perform label prediction, which cannot guarantee an overall optimum and 2) they only focus on the label prediction or the graph structure construction but are not competent in handling new samples. To this end, a novel nonnegative sparse graph (NNSG) learning method was first proposed. Then, both the label prediction and projection learning were integrated into linear regression. Finally, the linear regression and graph structure learning were unified within the same framework to overcome these two drawbacks. Therefore, a novel method, named learning a NNSG for linear regression was presented, in which the linear regression and graph learning were simultaneously performed to guarantee an overall optimum. In the learning process, the label information can be accurately propagated via the graph structure so that the linear regression can learn a discriminative projection to better fit sample labels and accurately classify new samples. An effective algorithm was designed to solve the corresponding optimization problem with fast convergence. Furthermore, NNSG provides a unified perceptiveness for a number of graph-based learning methods and linear regression methods. The experimental results showed that NNSG can obtain very high classification accuracy and greatly outperforms conventional G-SSL methods, especially some conventional graph construction methods.
Linear regression analysis of survival data with missing censoring indicators.
Wang, Qihua; Dinse, Gregg E
2011-04-01
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial. PMID:20559722
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
A Bayesian approach to linear regression in astronomy
NASA Astrophysics Data System (ADS)
Sereno, Mauro
2016-01-01
Linear regression is common in astronomical analyses. I discuss a Bayesian hierarchical modelling of data with heteroscedastic and possibly correlated measurement errors and intrinsic scatter. The method fully accounts for time evolution. The slope, the normalization, and the intrinsic scatter of the relation can evolve with the redshift. The intrinsic distribution of the independent variable is approximated using a mixture of Gaussian distributions whose means and standard deviations depend on time. The method can address scatter in the measured independent variable (a kind of Eddington bias), selection effects in the response variable (Malmquist bias), and departure from linearity in form of a knee. I tested the method with toy models and simulations and quantified the effect of biases and inefficient modelling. The R-package LIRA (LInear Regression in Astronomy) is made available to perform the regression.
Construction cost estimation of municipal incinerators by fuzzy linear regression
Chang, N.B.; Chen, Y.L.; Yang, H.H.
1996-12-31
Regression analysis has been widely used in engineering cost estimation. It is recognized that the fuzzy structure in cost estimation is a different type of uncertainty compared to the measurement error in the least-squares regression modeling. Hence, the uncertainties encountered in many events of construction and operating costs estimation and prediction cannot be fully depicted by conventional least-squares regression models. This paper presents a construction cost analysis of municipal incinerators by the techniques of fuzzy linear regression. A thorough investigation of construction costs in the Taiwan Resource Recovery Project was conducted based on design parameters such as design capacity, type of grate system, and the selected air pollution control process. The focus has been placed upon the methodology for dealing with the heterogeneity phenomenon of a set of observations for which regression is evaluated.
Locally linear regression for pose-invariant face recognition.
Chai, Xiujuan; Shan, Shiguang; Chen, Xilin; Gao, Wen
2007-07-01
The variation of facial appearance due to the viewpoint (/pose) degrades face recognition systems considerably, which is one of the bottlenecks in face recognition. One of the possible solutions is generating virtual frontal view from any given nonfrontal view to obtain a virtual gallery/probe face. Following this idea, this paper proposes a simple, but efficient, novel locally linear regression (LLR) method, which generates the virtual frontal view from a given nonfrontal face image. We first justify the basic assumption of the paper that there exists an approximate linear mapping between a nonfrontal face image and its frontal counterpart. Then, by formulating the estimation of the linear mapping as a prediction problem, we present the regression-based solution, i.e., globally linear regression. To improve the prediction accuracy in the case of coarse alignment, LLR is further proposed. In LLR, we first perform dense sampling in the nonfrontal face image to obtain many overlapped local patches. Then, the linear regression technique is applied to each small patch for the prediction of its virtual frontal patch. Through the combination of all these patches, the virtual frontal view is generated. The experimental results on the CMU PIE database show distinct advantage of the proposed method over Eigen light-field method.
Some Applied Research Concerns Using Multiple Linear Regression Analysis.
ERIC Educational Resources Information Center
Newman, Isadore; Fraas, John W.
The intention of this paper is to provide an overall reference on how a researcher can apply multiple linear regression in order to utilize the advantages that it has to offer. The advantages and some concerns expressed about the technique are examined. A number of practical ways by which researchers can deal with such concerns as…
HIGH RESOLUTION FOURIER ANALYSIS WITH AUTO-REGRESSIVE LINEAR PREDICTION
Barton, J.; Shirley, D.A.
1984-04-01
Auto-regressive linear prediction is adapted to double the resolution of Angle-Resolved Photoemission Extended Fine Structure (ARPEFS) Fourier transforms. Even with the optimal taper (weighting function), the commonly used taper-and-transform Fourier method has limited resolution: it assumes the signal is zero beyond the limits of the measurement. By seeking the Fourier spectrum of an infinite extent oscillation consistent with the measurements but otherwise having maximum entropy, the errors caused by finite data range can be reduced. Our procedure developed to implement this concept adapts auto-regressive linear prediction to extrapolate the signal in an effective and controllable manner. Difficulties encountered when processing actual ARPEFS data are discussed. A key feature of this approach is the ability to convert improved measurements (signal-to-noise or point density) into improved Fourier resolution.
Assessing Longitudinal Change: Adjustment for Regression to the Mean Effects
ERIC Educational Resources Information Center
Rocconi, Louis M.; Ethington, Corinna A.
2009-01-01
Pascarella (J Coll Stud Dev 47:508-520, 2006) has called for an increase in use of longitudinal data with pretest-posttest design when studying effects on college students. However, such designs that use multiple measures to document change are vulnerable to an important threat to internal validity, regression to the mean. Herein, we discuss a…
Modeling pan evaporation for Kuwait by multiple linear regression.
Almedeij, Jaber
2012-01-01
Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, Anne B.; Lizarraga, Joy S.
1996-01-01
Statistical operations termed model-adjustment procedures can be used to incorporate local data into existing regression modes to improve the predication of urban-runoff quality. Each procedure is a form of regression analysis in which the local data base is used as a calibration data set; the resulting adjusted regression models can then be used to predict storm-runoff quality at unmonitored sites. Statistical tests of the calibration data set guide selection among proposed procedures.
New Robust Face Recognition Methods Based on Linear Regression
Mi, Jian-Xun; Liu, Jin-Xing; Wen, Jiajun
2012-01-01
Nearest subspace (NS) classification based on linear regression technique is a very straightforward and efficient method for face recognition. A recently developed NS method, namely the linear regression-based classification (LRC), uses downsampled face images as features to perform face recognition. The basic assumption behind this kind method is that samples from a certain class lie on their own class-specific subspace. Since there are only few training samples for each individual class, which will cause the small sample size (SSS) problem, this problem gives rise to misclassification of previous NS methods. In this paper, we propose two novel LRC methods using the idea that every class-specific subspace has its unique basis vectors. Thus, we consider that each class-specific subspace is spanned by two kinds of basis vectors which are the common basis vectors shared by many classes and the class-specific basis vectors owned by one class only. Based on this concept, two classification methods, namely robust LRC 1 and 2 (RLRC 1 and 2), are given to achieve more robust face recognition. Unlike some previous methods which need to extract class-specific basis vectors, the proposed methods are developed merely based on the existence of the class-specific basis vectors but without actually calculating them. Experiments on three well known face databases demonstrate very good performance of the new methods compared with other state-of-the-art methods. PMID:22879992
A mechanism for precise linear and angular adjustment utilizing flexures
NASA Technical Reports Server (NTRS)
Ellis, J. R.
1986-01-01
The design and development of a mechanism for precise linear and angular adjustment is described. This work was in support of the development of a mechanical extensometer for biaxial strain measurement. A compact mechanism was required which would allow angular adjustments about perpendicular axes with better than 0.001 degree resolution. The approach adopted was first to develop a means of precise linear adjustment. To this end, a mechanism based on the toggle principle was built with inexpensive and easily manufactured parts. A detailed evaluation showed that the resolution of the mechanism was better than 1 micron and that adjustments made by using the device were repeatable. In the second stage of this work, the linear adjustment mechanisms were used in conjunction with a simple arrangement of flexural pivots and attachment blocks to provide the required angular adjustments. Attempts to use the mechanism in conjunction with the biaxial extensometer under development proved unsuccessful. Any form of in stitu adjustment was found to cause erratic changes in instrument output. These changes were due to problems with the suspension system. However, the subject mechanism performed well in its own right and appeared to have potential for use in other applications.
Prediction by linear regression on a quantum computer
NASA Astrophysics Data System (ADS)
Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco
2016-08-01
We give an algorithm for prediction on a quantum computer which is based on a linear regression model with least-squares optimization. In contrast to related previous contributions suffering from the problem of reading out the optimal parameters of the fit, our scheme focuses on the machine-learning task of guessing the output corresponding to a new input given examples of data points. Furthermore, we adapt the algorithm to process nonsparse data matrices that can be represented by low-rank approximations, and significantly improve the dependency on its condition number. The prediction result can be accessed through a single-qubit measurement or used for further quantum information processing routines. The algorithm's runtime is logarithmic in the dimension of the input space provided the data is given as quantum information as an input to the routine.
Morales, Esteban; de Leon, John Mark S.; Abdollahi, Niloufar; Yu, Fei; Nouri-Mahdavi, Kouros; Caprioli, Joseph
2016-01-01
Purpose The study was conducted to evaluate threshold smoothing algorithms to enhance prediction of the rates of visual field (VF) worsening in glaucoma. Methods We studied 798 patients with primary open-angle glaucoma and 6 or more years of follow-up who underwent 8 or more VF examinations. Thresholds at each VF location for the first 4 years or first half of the follow-up time (whichever was greater) were smoothed with clusters defined by the nearest neighbor (NN), Garway-Heath, Glaucoma Hemifield Test (GHT), and weighting by the correlation of rates at all other VF locations. Thresholds were regressed with a pointwise exponential regression (PER) model and a pointwise linear regression (PLR) model. Smaller root mean square error (RMSE) values of the differences between the observed and the predicted thresholds at last two follow-ups indicated better model predictions. Results The mean (SD) follow-up times for the smoothing and prediction phase were 5.3 (1.5) and 10.5 (3.9) years. The mean RMSE values for the PER and PLR models were unsmoothed data, 6.09 and 6.55; NN, 3.40 and 3.42; Garway-Heath, 3.47 and 3.48; GHT, 3.57 and 3.74; and correlation of rates, 3.59 and 3.64. Conclusions Smoothed VF data predicted better than unsmoothed data. Nearest neighbor provided the best predictions; PER also predicted consistently more accurately than PLR. Smoothing algorithms should be used when forecasting VF results with PER or PLR. Translational Relevance The application of smoothing algorithms on VF data can improve forecasting in VF points to assist in treatment decisions. PMID:26998405
Adjustment of regional regression equations for urban storm-runoff quality using at-site data
Barks, C.S.
1996-01-01
Regional regression equations have been developed to estimate urban storm-runoff loads and mean concentrations using a national data base. Four statistical methods using at-site data to adjust the regional equation predictions were developed to provide better local estimates. The four adjustment procedures are a single-factor adjustment, a regression of the observed data against the predicted values, a regression of the observed values against the predicted values and additional local independent variables, and a weighted combination of a local regression with the regional prediction. Data collected at five representative storm-runoff sites during 22 storms in Little Rock, Arkansas, were used to verify, and, when appropriate, adjust the regional regression equation predictions. Comparison of observed values of stormrunoff loads and mean concentrations to the predicted values from the regional regression equations for nine constituents (chemical oxygen demand, suspended solids, total nitrogen as N, total ammonia plus organic nitrogen as N, total phosphorus as P, dissolved phosphorus as P, total recoverable copper, total recoverable lead, and total recoverable zinc) showed large prediction errors ranging from 63 percent to more than several thousand percent. Prediction errors for 6 of the 18 regional regression equations were less than 100 percent and could be considered reasonable for water-quality prediction equations. The regression adjustment procedure was used to adjust five of the regional equation predictions to improve the predictive accuracy. For seven of the regional equations the observed and the predicted values are not significantly correlated. Thus neither the unadjusted regional equations nor any of the adjustments were appropriate. The mean of the observed values was used as a simple estimator when the regional equation predictions and adjusted predictions were not appropriate.
Outlier Detection In Linear Regression Using Standart Parity Space Approach
NASA Astrophysics Data System (ADS)
Mustafa Durdag, Utkan; Hekimoglu, Serif
2013-04-01
Despite all technological advancements, outliers may occur due to some mistakes in engineering measurements. Before estimation of unknown parameters, aforementioned outliers must be detected and removed from the measurements. There are two main outlier detection methods: the conventional tests based on least square approach (e.g. Baarda, Pope etc.) and the robust tests (e.g. Huber, Hampel etc.) are used to identify outliers in a set of measurement. Standart Parity Space Approach is one of the important model-based Fault Detection and Isolation (FDI) technique that usually uses in Control Engineering. In this study the standart parity space method is used for outlier detection in linear regression. Our main goal is to compare success of two approaches of standart parity space method and conventional tests in linear regression through the Monte Carlo simulation with each other. The least square estimation is the most common estimator as known and it minimizes the sum of squared residuals. In standart parity space approach to eliminate unknown vector, the measurement vector projected onto the left null space of the coefficient matrix. Thus, the orthogonal condition of parity vector is satisfied and only the effects of noise vector noticed. The residual vector is derived from two cases that one is absence of an outlier; the other is occurrence of an outlier. Its likelihood function is used for determining the detection decision function for global Test. Localization decision function is calculated for each column of parity matrix and the maximum one of these values is accepted as an outlier. There are some results obtained from two different intervals that one of them is between 3σ and 6σ (small outlier) the other one is between 6σ and 12σ (large outlier) for outlier generator when the number of unknown parameter is chosen 2 and 3. The measure success rates (MSR) of Baarda's method is better than the standart parity space method when the confidence intervals are
Radial basis function networks with linear interval regression weights for symbolic interval data.
Su, Shun-Feng; Chuang, Chen-Chia; Tao, C W; Jeng, Jin-Tsong; Hsiao, Chih-Ching
2012-02-01
This paper introduces a new structure of radial basis function networks (RBFNs) that can successfully model symbolic interval-valued data. In the proposed structure, to handle symbolic interval data, the Gaussian functions required in the RBFNs are modified to consider interval distance measure, and the synaptic weights of the RBFNs are replaced by linear interval regression weights. In the linear interval regression weights, the lower and upper bounds of the interval-valued data as well as the center and range of the interval-valued data are considered. In addition, in the proposed approach, two stages of learning mechanisms are proposed. In stage 1, an initial structure (i.e., the number of hidden nodes and the adjustable parameters of radial basis functions) of the proposed structure is obtained by the interval competitive agglomeration clustering algorithm. In stage 2, a gradient-descent kind of learning algorithm is applied to fine-tune the parameters of the radial basis function and the coefficients of the linear interval regression weights. Various experiments are conducted, and the average behavior of the root mean square error and the square of the correlation coefficient in the framework of a Monte Carlo experiment are considered as the performance index. The results clearly show the effectiveness of the proposed structure.
Robust linear regression with broad distributions of errors
NASA Astrophysics Data System (ADS)
Postnikov, Eugene B.; Sokolov, Igor M.
2015-09-01
We consider the problem of linear fitting of noisy data in the case of broad (say α-stable) distributions of random impacts ("noise"), which can lack even the first moment. This situation, common in statistical physics of small systems, in Earth sciences, in network science or in econophysics, does not allow for application of conventional Gaussian maximum-likelihood estimators resulting in usual least-squares fits. Such fits lead to large deviations of fitted parameters from their true values due to the presence of outliers. The approaches discussed here aim onto the minimization of the width of the distribution of residua. The corresponding width of the distribution can either be defined via the interquantile distance of the corresponding distributions or via the scale parameter in its characteristic function. The methods provide the robust regression even in the case of short samples with large outliers, and are equivalent to the normal least squares fit for the Gaussian noises. Our discussion is illustrated by numerical examples.
Forecasting Groundwater Temperature with Linear Regression Models Using Historical Data.
Figura, Simon; Livingstone, David M; Kipfer, Rolf
2015-01-01
Although temperature is an important determinant of many biogeochemical processes in groundwater, very few studies have attempted to forecast the response of groundwater temperature to future climate warming. Using a composite linear regression model based on the lagged relationship between historical groundwater and regional air temperature data, empirical forecasts were made of groundwater temperature in several aquifers in Switzerland up to the end of the current century. The model was fed with regional air temperature projections calculated for greenhouse-gas emissions scenarios A2, A1B, and RCP3PD. Model evaluation revealed that the approach taken is adequate only when the data used to calibrate the models are sufficiently long and contain sufficient variability. These conditions were satisfied for three aquifers, all fed by riverbank infiltration. The forecasts suggest that with respect to the reference period 1980 to 2009, groundwater temperature in these aquifers will most likely increase by 1.1 to 3.8 K by the end of the current century, depending on the greenhouse-gas emissions scenario employed. PMID:25412761
Forecasting Groundwater Temperature with Linear Regression Models Using Historical Data.
Figura, Simon; Livingstone, David M; Kipfer, Rolf
2015-01-01
Although temperature is an important determinant of many biogeochemical processes in groundwater, very few studies have attempted to forecast the response of groundwater temperature to future climate warming. Using a composite linear regression model based on the lagged relationship between historical groundwater and regional air temperature data, empirical forecasts were made of groundwater temperature in several aquifers in Switzerland up to the end of the current century. The model was fed with regional air temperature projections calculated for greenhouse-gas emissions scenarios A2, A1B, and RCP3PD. Model evaluation revealed that the approach taken is adequate only when the data used to calibrate the models are sufficiently long and contain sufficient variability. These conditions were satisfied for three aquifers, all fed by riverbank infiltration. The forecasts suggest that with respect to the reference period 1980 to 2009, groundwater temperature in these aquifers will most likely increase by 1.1 to 3.8 K by the end of the current century, depending on the greenhouse-gas emissions scenario employed.
Linear and nonlinear structural identifications using the support vector regression
NASA Astrophysics Data System (ADS)
Zhang, Jian; Sato, Tadanobu
2006-03-01
Robust and efficient identification methods are necessary to study in the structural health monitoring field, especially when the I/O data are accompanied by high-level noise and the structure studied is a large-scale one. The Support vector Regression (SVR) is a promising nonlinear modeling method that has been found working very well in many fields, and has a powerful potential to be applied in system identifications. The SVR-based methods are provided in this article to make linear large-scale structural identification and nonlinear hysteretic structural identifications. The LS estimator is a cornerstone of statistics but less robust to outliers. Instead of the classical Gaussian loss function without regularization used in the LS method, a novel e-insensitive loss function is employed in the SVR. Meanwhile, the SVR adopts the 'max-margined' idea to search for an optimum hyper-plane separating the training data into two subsets by maximizing the margin between them. Therefore, the SVR-based structural identification approach is robust and accuracy even though the observation data involve different kinds and high-level noise. By means of the local strategy, the linear large-scale structural identification approach based on the SVR is first investigated. The novel SVR can identify structural parameters directly by writing structural observation equations in linear equations with respect to unknown structural parameters. Furthermore, the substrutural idea employed reduces the number of unknown parameters seriously to guarantee the SVR work in a low dimension and to focus the identification on a local arbitrary subsystem. It is crucial to make nonlinear structural identification also, because structures exhibit highly nonlinear characters under severe loads such as strong seismic excitations. The Bouc-Wen model is often utilized to describe structural nonlinear properties, the power parameter of the model however is often assumed as known even though it is unknown
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
ERIC Educational Resources Information Center
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
ERIC Educational Resources Information Center
Olejnik, Stephen; Mills, Jamie; Keselman, Harvey
2000-01-01
Evaluated the use of Mallow's C(p) and Wherry's adjusted R squared (R. Wherry, 1931) statistics to select a final model from a pool of model solutions using computer generated data. Neither statistic identified the underlying regression model any better than, and usually less well than, the stepwise selection method, which itself was poor for…
Interpreting Multiple Linear Regression: A Guidebook of Variable Importance
ERIC Educational Resources Information Center
Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim
2012-01-01
Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…
Adjustable permanent quadrupoles for the next linear collider
James T. Volk et al.
2001-06-22
The proposed Next Linear Collider (NLC) will require over 1400 adjustable quadrupoles between the main linacs' accelerator structures. These 12.7 mm bore quadrupoles will have a range of integrated strength from 0.6 to 138 Tesla, with a maximum gradient of 141 Tesla per meter, an adjustment range of +0 to {minus}20% and effective lengths from 324 mm to 972 mm. The magnetic center must remain stable to within 1 micron during the 20% adjustment. In an effort to reduce costs and increase reliability, several designs using hybrid permanent magnets have been developed. Four different prototypes have been built. All magnets have iron poles and use Samarium Cobalt to provide the magnetic fields. Two use rotating permanent magnetic material to vary the gradient, one uses a sliding shunt to vary the gradient and the fourth uses counter rotating magnets. Preliminary data on gradient strength, temperature stability, and magnetic center position stability are presented. These data are compared to an equivalent electromagnetic prototype.
ERIC Educational Resources Information Center
Hecht, Jeffrey B.
The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…
NASA Astrophysics Data System (ADS)
Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa
2015-11-01
A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.
NASA Astrophysics Data System (ADS)
Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei
2014-10-01
Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.
Two biased estimation techniques in linear regression: Application to aircraft
NASA Technical Reports Server (NTRS)
Klein, Vladislav
1988-01-01
Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.
Divergent estimation error in portfolio optimization and in linear regression
NASA Astrophysics Data System (ADS)
Kondor, I.; Varga-Haszonits, I.
2008-08-01
The problem of estimation error in portfolio optimization is discussed, in the limit where the portfolio size N and the sample size T go to infinity such that their ratio is fixed. The estimation error strongly depends on the ratio N/T and diverges for a critical value of this parameter. This divergence is the manifestation of an algorithmic phase transition, it is accompanied by a number of critical phenomena, and displays universality. As the structure of a large number of multidimensional regression and modelling problems is very similar to portfolio optimization, the scope of the above observations extends far beyond finance, and covers a large number of problems in operations research, machine learning, bioinformatics, medical science, economics, and technology.
NASA Astrophysics Data System (ADS)
Li, Guofa; Huang, Wei; Zheng, Hao; Zhang, Baoqing
2016-02-01
The spectral ratio method (SRM) is widely used to estimate quality factor Q via the linear regression of seismic attenuation under the assumption of a constant Q. However, the estimate error will be introduced when this assumption is violated. For the frequency-dependent Q described by a power-law function, we derived the analytical expression of estimate error as a function of the power-law exponent γ and the ratio of the bandwidth to the central frequency σ . Based on the theoretical analysis, we found that the estimate errors are mainly dominated by the exponent γ , and less affected by the ratio σ . This phenomenon implies that the accuracy of the Q estimate can hardly be improved by adjusting the width and range of the frequency band. Hence, we proposed a two-parameter regression method to estimate the frequency-dependent Q from the nonlinear seismic attenuation. The proposed method was tested using the direct waves acquired by a near-surface cross-hole survey, and its reliability was evaluated in comparison with the result of SRM.
NASA Astrophysics Data System (ADS)
van Gaans, P. F. M.; Vriend, S. P.
Application of ridge regression in geoscience usually is a more appropriate technique than ordinary least-squares regression, especially in the situation of highly intercorrelated predictor variables. A FORTRAN 77 program RIDGE for ridged multiple linear regression is presented. The theory of linear regression and ridge regression is treated, to allow for a careful interpretation of the results and to understand the structure of the program. The program gives various parameters to evaluate the extent of multicollinearity within a given regression problem, such as the correlation matrix, multiple correlations among the predictors, variance inflation factors, eigenvalues, condition number, and the determinant of the predictors correlation matrix. The best method for the optimum choice of the ridge parameter with ridge regression has not been established yet. Estimates of the ridge bias, ridged variance inflation factors, estimates, and norms for the ridge parameter therefore are given as output by RIDGE and should complement inspection of the ridge traces. Application within the earth sciences is discussed.
Identifying predictors of physics item difficulty: A linear regression approach
NASA Astrophysics Data System (ADS)
Mesic, Vanes; Muratovic, Hasnija
2011-06-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge
Zhang, Yiwei; Pan, Wei
2015-03-01
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.
Simultaneous Determination of Cobalt, Copper, and Nickel by Multivariate Linear Regression.
ERIC Educational Resources Information Center
Dado, Greg; Rosenthal, Jeffrey
1990-01-01
Presented is an experiment where the concentrations of three metal ions in a solution are simultaneously determined by ultraviolet-vis spectroscopy. Availability of the computer program used for statistically analyzing data using a multivariate linear regression is listed. (KR)
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Linear Regression in High Dimension and/or for Correlated Inputs
NASA Astrophysics Data System (ADS)
Jacques, J.; Fraix-Burnet, D.
2014-12-01
Ordinary least square is the common way to estimate linear regression models. When inputs are correlated or when they are too numerous, regression methods using derived inputs directions or shrinkage methods can be efficient alternatives. Methods using derived inputs directions build new uncorrelated variables as linear combination of the initial inputs, whereas shrinkage methods introduce regularization and variable selection by penalizing the usual least square criterion. Both kinds of methods are presented and illustrated thanks to the R software on an astronomical dataset.
Graphical Description of Johnson-Neyman Outcomes for Linear and Quadratic Regression Surfaces.
ERIC Educational Resources Information Center
Schafer, William D.; Wang, Yuh-Yin
A modification of the usual graphical representation of heterogeneous regressions is described that can aid in interpreting significant regions for linear or quadratic surfaces. The standard Johnson-Neyman graph is a bivariate plot with the criterion variable on the ordinate and the predictor variable on the abscissa. Regression surfaces are drawn…
ERIC Educational Resources Information Center
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
NASA Astrophysics Data System (ADS)
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
NASA Astrophysics Data System (ADS)
Zhu, Dazhou; Ji, Baoping; Meng, Chaoying; Shi, Bolin; Tu, Zhenhua; Qing, Zhaoshen
Hybrid linear analysis (HLA), partial least-squares (PLS) regression, and the linear least square support vector machine (LSSVM) were used to determinate the soluble solids content (SSC) of apple by Fourier transform near-infrared (FT-NIR) spectroscopy. The performance of these three linear regression methods was compared. Results showed that HLA could be used for the analysis of complex solid samples such as apple. The predictive ability of SSC model constructed by HLA was comparable to that of PLS. HLA was sensitive to outliers, thus the outliers should be eliminated before HLA calibration. Linear LSSVM performed better than PLS and HLA. Direct orthogonal signal correction (DOSC) pretreatment was effective for PLS and linear LSSVM, but not suitable for HLA. The combination of DOSC and linear LSSVM had good generalization ability and was not sensitive to outliers, so it is a promising method for linear multivariate calibration.
An improved multiple linear regression and data analysis computer program package
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
VanderWeele, Tyler J; Robinson, Whitney R
2014-07-01
We consider several possible interpretations of the "effect of race" when regressions are run with race as an exposure variable, controlling also for various confounding and mediating variables. When adjustment is made for socioeconomic status early in a person's life, we discuss under what contexts the regression coefficients for race can be interpreted as corresponding to the extent to which a racial inequality would remain if various socioeconomic distributions early in life across racial groups could be equalized. When adjustment is also made for adult socioeconomic status, we note how the overall racial inequality can be decomposed into the portion that would be eliminated by equalizing adult socioeconomic status across racial groups and the portion of the inequality that would remain even if adult socioeconomic status across racial groups were equalized. We also discuss a stronger interpretation of the effect of race (stronger in terms of assumptions) involving the joint effects of race-associated physical phenotype (eg, skin color), parental physical phenotype, genetic background, and cultural context when such variables are thought to be hypothetically manipulable and if adequate control for confounding were possible. We discuss some of the challenges with such an interpretation. Further discussion is given as to how the use of selected populations in examining racial disparities can additionally complicate the interpretation of the effects.
On causal interpretation of race in regressions adjusting for confounding and mediating variables
VanderWeele, Tyler J.; Robinson, Whitney R.
2014-01-01
We consider several possible interpretations of the “effect of race” when regressions are run with race as an exposure variable, controlling also for various confounding and mediating variables. When adjustment is made for socioeconomic status early in a person’s life, we discuss under what contexts the regression coefficients for race can be interpreted as corresponding to the extent to which a racial inequality would remain if various socioeconomic distributions early in life across racial groups could be equalized. When adjustment is also made for adult socioeconomic status, we note how the overall racial inequality can be decomposed into the portion that would be eliminated by equalizing adult socioeconomic status across racial groups and the portion of the inequality that would remain even if adult socioeconomic status across racial groups were equalized. We also discuss a stronger interpretation of the “effect of race” (stronger in terms of assumptions) involving the joint effects of race-associated physical phenotype (e.g. skin color), parental physical phenotype, genetic background and cultural context when such variables are thought to be hypothetically manipulable and if adequate control for confounding were possible. We discuss some of the challenges with such an interpretation. Further discussion is given as to how the use of selected populations in examining racial disparities can additionally complicate the interpretation of the effects. PMID:24887159
Comparison of Linear and Non-Linear Regression Models to Estimate Leaf Area Index of Dryland Shrubs.
NASA Astrophysics Data System (ADS)
Dashti, H.; Glenn, N. F.; Ilangakoon, N. T.; Mitchell, J.; Dhakal, S.; Spaete, L.
2015-12-01
Leaf area index (LAI) is a key parameter in global ecosystem studies. LAI is considered a forcing variable in land surface processing models since ecosystem dynamics are highly correlated to LAI. In response to environmental limitations, plants in semiarid ecosystems have smaller leaf area, making accurate estimation of LAI by remote sensing a challenging issue. Optical remote sensing (400-2500 nm) techniques to estimate LAI are based either on radiative transfer models (RTMs) or statistical approaches. Considering the complex radiation field of dry ecosystems, simple 1-D RTMs lead to poor results, and on the other hand, inversion of more complex 3-D RTMs is a demanding task which requires the specification of many variables. A good alternative to physical approaches is using methods based on statistics. Similar to many natural phenomena, there is a non-linear relationship between LAI and top of canopy electromagnetic waves reflected to optical sensors. Non-linear regression models can better capture this relationship. However, considering the problem of a few numbers of observations in comparison to the feature space (n
linear models. In this study linear versus non-linear regression techniques were investigated to estimate LAI. Our study area is located in southwestern Idaho, Great Basin. Sagebrush (Artemisia tridentata spp) serves a critical role in maintaining the structure of this ecosystem. Using a leaf area meter (Accupar LP-80), LAI values were measured in the field. Linear Partial Least Square regression and non-linear, tree based Random Forest regression have been implemented to estimate the LAI of sagebrush from hyperspectral data (AVIRIS-ng) collected in late summer 2014. Cross validation of results indicate that PLS can provide comparable results to Random Forest.
Algamal, Zakariya Yahya; Lee, Muhammad Hisyam
2015-12-01
Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification.
Lopez, Michael J; Gutman, Roee
2014-11-28
Propensity score methods are common for estimating a binary treatment effect when treatment assignment is not randomized. When exposure is measured on an ordinal scale (i.e. low-medium-high), however, propensity score inference requires extensions which have received limited attention. Estimands of possible interest with an ordinal exposure are the average treatment effects between each pair of exposure levels. Using these estimands, it is possible to determine an optimal exposure level. Traditional methods, including dichotomization of the exposure or a series of binary propensity score comparisons across exposure pairs, are generally inadequate for identification of optimal levels. We combine subclassification with regression adjustment to estimate transitive, unbiased average causal effects across an ordered exposure, and apply our method on the 2005-2006 National Health and Nutrition Examination Survey to estimate the effects of nutritional label use on body mass index.
Li, Li; Brumback, Babette A; Weppelmann, Thomas A; Morris, J Glenn; Ali, Afsar
2016-08-15
Motivated by an investigation of the effect of surface water temperature on the presence of Vibrio cholerae in water samples collected from different fixed surface water monitoring sites in Haiti in different months, we investigated methods to adjust for unmeasured confounding due to either of the two crossed factors site and month. In the process, we extended previous methods that adjust for unmeasured confounding due to one nesting factor (such as site, which nests the water samples from different months) to the case of two crossed factors. First, we developed a conditional pseudolikelihood estimator that eliminates fixed effects for the levels of each of the crossed factors from the estimating equation. Using the theory of U-Statistics for independent but non-identically distributed vectors, we show that our estimator is consistent and asymptotically normal, but that its variance depends on the nuisance parameters and thus cannot be easily estimated. Consequently, we apply our estimator in conjunction with a permutation test, and we investigate use of the pigeonhole bootstrap and the jackknife for constructing confidence intervals. We also incorporate our estimator into a diagnostic test for a logistic mixed model with crossed random effects and no unmeasured confounding. For comparison, we investigate between-within models extended to two crossed factors. These generalized linear mixed models include covariate means for each level of each factor in order to adjust for the unmeasured confounding. We conduct simulation studies, and we apply the methods to the Haitian data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26892025
Lidauer, M H; Emmerling, R; Mäntysaari, E A
2008-06-01
A multiplicative random regression (M-RRM) test-day (TD) model was used to analyse daily milk yields from all available parities of German and Austrian Simmental dairy cattle. The method to account for heterogeneous variance (HV) was based on the multiplicative mixed model approach of Meuwissen. The variance model for the heterogeneity parameters included a fixed region x year x month x parity effect and a random herd x test-month effect with a within-herd first-order autocorrelation between test-months. Acceleration of variance model solutions after each multiplicative model cycle enabled fast convergence of adjustment factors and reduced total computing time significantly. Maximum Likelihood estimation of within-strata residual variances was enhanced by inclusion of approximated information on loss in degrees of freedom due to estimation of location parameters. This improved heterogeneity estimates for very small herds. The multiplicative model was compared with a model that assumed homogeneous variance. Re-estimated genetic variances, based on Mendelian sampling deviations, were homogeneous for the M-RRM TD model but heterogeneous for the homogeneous random regression TD model. Accounting for HV had large effect on cow ranking but moderate effect on bull ranking.
NASA Astrophysics Data System (ADS)
Tan, C. H.; Matjafri, M. Z.; Lim, H. S.
2015-10-01
This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
NASA Astrophysics Data System (ADS)
Wheeler, David C.; Calder, Catherine A.
2007-06-01
The realization in the statistical and geographical sciences that a relationship between an explanatory variable and a response variable in a linear regression model is not always constant across a study area has led to the development of regression models that allow for spatially varying coefficients. Two competing models of this type are geographically weighted regression (GWR) and Bayesian regression models with spatially varying coefficient processes (SVCP). In the application of these spatially varying coefficient models, marginal inference on the regression coefficient spatial processes is typically of primary interest. In light of this fact, there is a need to assess the validity of such marginal inferences, since these inferences may be misleading in the presence of explanatory variable collinearity. In this paper, we present the results of a simulation study designed to evaluate the sensitivity of the spatially varying coefficients in the competing models to various levels of collinearity. The simulation study results show that the Bayesian regression model produces more accurate inferences on the regression coefficients than does GWR. In addition, the Bayesian regression model is overall fairly robust in terms of marginal coefficient inference to moderate levels of collinearity, and degrades less substantially than GWR with strong collinearity.
ERIC Educational Resources Information Center
Thompson, Russel L.
Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are…
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
ERIC Educational Resources Information Center
Richter, Tobias
2006-01-01
Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Calibrated Peer Review for Interpreting Linear Regression Parameters: Results from a Graduate Course
ERIC Educational Resources Information Center
Enders, Felicity B.; Jenkins, Sarah; Hoverman, Verna
2010-01-01
Biostatistics is traditionally a difficult subject for students to learn. While the mathematical aspects are challenging, it can also be demanding for students to learn the exact language to use to correctly interpret statistical results. In particular, correctly interpreting the parameters from linear regression is both a vital tool and a…
Dufrenois, F; Noyer, J C
2013-02-01
Linear discriminant analysis, such as Fisher's criterion, is a statistical learning tool traditionally devoted to separating a training dataset into two or even several classes by the way of linear decision boundaries. In this paper, we show that this tool can formalize the robust linear regression problem as a robust estimator will do. More precisely, we develop a one-class Fischer's criterion in which the maximization provides both the regression parameters and the separation of the data in two classes: typical data and atypical data or outliers. This new criterion is built on the statistical properties of the subspace decomposition of the hat matrix. From this angle, we improve the discriminative properties of the hat matrix which is traditionally used as outlier diagnostic measure in linear regression. Naturally, we call this new approach discriminative hat matrix. The proposed algorithm is fully nonsupervised and needs only the initialization of one parameter. Synthetic and real datasets are used to study the performance both in terms of regression and classification of the proposed approach. We also illustrate its potential application to image recognition and fundamental matrix estimation in computer vision.
Point Estimates and Confidence Intervals for Variable Importance in Multiple Linear Regression
ERIC Educational Resources Information Center
Thomas, D. Roland; Zhu, PengCheng; Decady, Yves J.
2007-01-01
The topic of variable importance in linear regression is reviewed, and a measure first justified theoretically by Pratt (1987) is examined in detail. Asymptotic variance estimates are used to construct individual and simultaneous confidence intervals for these importance measures. A simulation study of their coverage properties is reported, and an…
ERIC Educational Resources Information Center
Nelson, Dean
2009-01-01
Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…
Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma
2016-01-01
Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens.
Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma
2016-01-01
Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens. PMID:27651666
Sparse linear regression with elastic net regularization for brain-computer interfaces.
Kelly, John W; Degenhart, Alan D; Siewiorek, Daniel P; Smailagic, Asim; Wang, Wei
2012-01-01
This paper demonstrates the feasibility of decoding neuronal population signals using a sparse linear regression model with an elastic net penalty. In offline analysis of real electrocorticographic (ECoG) neural data the elastic net achieved a timepoint decoding accuracy of 95% for classifying hand grasps vs. rest, and 82% for moving a cursor in 1-D space towards a target. These results were superior to those obtained using ℓ(2)-penalized and unpenalized linear regression, and marginally better than ℓ(1)-penalized regression. Elastic net and the ℓ(1)-penalty also produced sparse feature sets, but the elastic net did not eliminate correlated features, which could result in a more stable decoder for brain-computer interfaces.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. PMID:26774211
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique.
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Hoffman, Haydn; Lee, Sunghoon I; Garst, Jordan H; Lu, Derek S; Li, Charles H; Nagasawa, Daniel T; Ghalehsari, Nima; Jahanforouz, Nima; Razaghy, Mehrdad; Espinal, Marie; Ghavamrezaii, Amir; Paak, Brian H; Wu, Irene; Sarrafzadeh, Majid; Lu, Daniel C
2015-09-01
This study introduces the use of multivariate linear regression (MLR) and support vector regression (SVR) models to predict postoperative outcomes in a cohort of patients who underwent surgery for cervical spondylotic myelopathy (CSM). Currently, predicting outcomes after surgery for CSM remains a challenge. We recruited patients who had a diagnosis of CSM and required decompressive surgery with or without fusion. Fine motor function was tested preoperatively and postoperatively with a handgrip-based tracking device that has been previously validated, yielding mean absolute accuracy (MAA) results for two tracking tasks (sinusoidal and step). All patients completed Oswestry disability index (ODI) and modified Japanese Orthopaedic Association questionnaires preoperatively and postoperatively. Preoperative data was utilized in MLR and SVR models to predict postoperative ODI. Predictions were compared to the actual ODI scores with the coefficient of determination (R(2)) and mean absolute difference (MAD). From this, 20 patients met the inclusion criteria and completed follow-up at least 3 months after surgery. With the MLR model, a combination of the preoperative ODI score, preoperative MAA (step function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.452; MAD=0.0887; p=1.17 × 10(-3)). With the SVR model, a combination of preoperative ODI score, preoperative MAA (sinusoidal function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.932; MAD=0.0283; p=5.73 × 10(-12)). The SVR model was more accurate than the MLR model. The SVR can be used preoperatively in risk/benefit analysis and the decision to operate. PMID:26115898
Hoffman, Haydn; Lee, Sunghoon Ivan; Garst, Jordan H.; Lu, Derek S.; Li, Charles H.; Nagasawa, Daniel T.; Ghalehsari, Nima; Jahanforouz, Nima; Razaghy, Mehrdad; Espinal, Marie; Ghavamrezaii, Amir; Paak, Brian H.; Wu, Irene; Sarrafzadeh, Majid; Lu, Daniel C.
2016-01-01
This study introduces the use of multivariate linear regression (MLR) and support vector regression (SVR) models to predict postoperative outcomes in a cohort of patients who underwent surgery for cervical spondylotic myelopathy (CSM). Currently, predicting outcomes after surgery for CSM remains a challenge. We recruited patients who had a diagnosis of CSM and required decompressive surgery with or without fusion. Fine motor function was tested preoperatively and postoperatively with a handgrip-based tracking device that has been previously validated, yielding mean absolute accuracy (MAA) results for two tracking tasks (sinusoidal and step). All patients completed Oswestry disability index (ODI) and modified Japanese Orthopaedic Association questionnaires preoperatively and postoperatively. Preoperative data was utilized in MLR and SVR models to predict postoperative ODI. Predictions were compared to the actual ODI scores with the coefficient of determination (R2) and mean absolute difference (MAD). From this, 20 patients met the inclusion criteria and completed follow-up at least 3 months after surgery. With the MLR model, a combination of the preoperative ODI score, preoperative MAA (step function), and symptom duration yielded the best prediction of postoperative ODI (R2 = 0.452; MAD = 0.0887; p = 1.17 × 10−3). With the SVR model, a combination of preoperative ODI score, preoperative MAA (sinusoidal function), and symptom duration yielded the best prediction of postoperative ODI (R2 = 0.932; MAD = 0.0283; p = 5.73 × 10−12). The SVR model was more accurate than the MLR model. The SVR can be used preoperatively in risk/benefit analysis and the decision to operate. PMID:26115898
SERF: A Simple, Effective, Robust, and Fast Image Super-Resolver From Cascaded Linear Regression.
Hu, Yanting; Wang, Nannan; Tao, Dacheng; Gao, Xinbo; Li, Xuelong
2016-09-01
Example learning-based image super-resolution techniques estimate a high-resolution image from a low-resolution input image by relying on high- and low-resolution image pairs. An important issue for these techniques is how to model the relationship between high- and low-resolution image patches: most existing complex models either generalize hard to diverse natural images or require a lot of time for model training, while simple models have limited representation capability. In this paper, we propose a simple, effective, robust, and fast (SERF) image super-resolver for image super-resolution. The proposed super-resolver is based on a series of linear least squares functions, namely, cascaded linear regression. It has few parameters to control the model and is thus able to robustly adapt to different image data sets and experimental settings. The linear least square functions lead to closed form solutions and therefore achieve computationally efficient implementations. To effectively decrease these gaps, we group image patches into clusters via k-means algorithm and learn a linear regressor for each cluster at each iteration. The cascaded learning process gradually decreases the gap of high-frequency detail between the estimated high-resolution image patch and the ground truth image patch and simultaneously obtains the linear regression parameters. Experimental results show that the proposed method achieves superior performance with lower time consumption than the state-of-the-art methods.
Kim, Dae-Hee; Choi, Jae-Hun; Lim, Myung-Eun; Park, Soo-Jun
2008-01-01
This paper suggests the method of correcting distance between an ambient intelligence display and a user based on linear regression and smoothing method, by which distance information of a user who approaches to the display can he accurately output even in an unanticipated condition using a passive infrared VIR) sensor and an ultrasonic device. The developed system consists of an ambient intelligence display and an ultrasonic transmitter, and a sensor gateway. Each module communicates with each other through RF (Radio frequency) communication. The ambient intelligence display includes an ultrasonic receiver and a PIR sensor for motion detection. In particular, this system selects and processes algorithms such as smoothing or linear regression for current input data processing dynamically through judgment process that is determined using the previous reliable data stored in a queue. In addition, we implemented GUI software with JAVA for real time location tracking and an ambient intelligence display.
Island embryo regression driven by a beam of self-ions in the linear regime
NASA Astrophysics Data System (ADS)
Flynn, C. P.
2010-10-01
The kinetics of island growth and regression are discussed under the approximation of linear response, including the Gibbs-Thompson potential, for a reacting assembly of adatoms and advacancies (thermal defects) on a surface irradiated with a beam of self-ions. First the quasistatic growth or shrinkage rate, for islands of size n less than the critical size \\hat {n} , is calculated for the driven system, exactly, for linear response. This result is employed to determine successively: (i) the regression rate of driven embryo islands with n \\lt \\hat {n} ; and (ii) the structure of the steady state decay chain established when embryos of a particular size n_{0}\\lt \\hat {n} are created by ion beam impacts. The changed embryo distribution caused by irradiation differs markedly from the populations of the embryos at equilibrium.
User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)
Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.
2009-01-01
Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods. PMID:23247520
Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J
2015-08-01
Regression-based prosthesis control using surface electromyography (EMG) has demonstrated real-time simultaneous control of multiple degrees of freedom (DOFs) in transradial amputees. However, these systems have been limited to control of wrist DOFs. Use of intramuscular EMG has shown promise for both wrist and hand control in able-bodied subjects, but to date has not been evaluated in amputee subjects. The objective of this study was to evaluate two regression-based simultaneous control methods using intramuscular EMG in transradial amputees and compare their performance to able-bodied subjects. Two transradial amputees and sixteen able-bodied subjects used fine wire EMG recorded from six forearm muscles to control three wrist/hand DOFs: wrist rotation, wrist flexion/extension, and hand open/close. Both linear regression and probability-weighted regression systems were evaluated in a virtual Fitts' Law test. Though both amputee subjects initially produced worse performance metrics than the able-bodied subjects, the amputee subject who completed multiple experimental blocks of the Fitts' law task demonstrated substantial learning. This subject's performance was within the range of able-bodied subjects by the end of the experiment. Both amputee subjects also showed improved performance when using probability-weighted regression for targets requiring use of only one DOF, and mirrored statistically significant differences observed with able-bodied subjects. These results indicate that amputee subjects may require more learning to achieve similar performance metrics as able-bodied subjects. These results also demonstrate that comparative findings between linear and probability-weighted regression with able-bodied subjects reflect performance differences when used by the amputee population.
Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.
Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo
2015-08-01
Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.
Significance tests to determine the direction of effects in linear regression models.
Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander
2015-02-01
Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice. PMID:24620829
Linear regression models of floor surface parameters on friction between Neolite and quarry tiles.
Chang, Wen-Ruey; Matz, Simon; Grönqvist, Raoul; Hirvonen, Mikko
2010-01-01
For slips and falls, friction is widely used as an indicator of surface slipperiness. Surface parameters, including surface roughness and waviness, were shown to influence friction by correlating individual surface parameters with the measured friction. A collective input from multiple surface parameters as a predictor of friction, however, could provide a broader perspective on the contributions from all the surface parameters evaluated. The objective of this study was to develop regression models between the surface parameters and measured friction. The dynamic friction was measured using three different mixtures of glycerol and water as contaminants. Various surface roughness and waviness parameters were measured using three different cut-off lengths. The regression models indicate that the selected surface parameters can predict the measured friction coefficient reliably in most of the glycerol concentrations and cut-off lengths evaluated. The results of the regression models were, in general, consistent with those obtained from the correlation between individual surface parameters and the measured friction in eight out of nine conditions evaluated in this experiment. A hierarchical regression model was further developed to evaluate the cumulative contributions of the surface parameters in the final iteration by adding these parameters to the regression model one at a time from the easiest to measure to the most difficult to measure and evaluating their impacts on the adjusted R(2) values. For practical purposes, the surface parameter R(a) alone would account for the majority of the measured friction even if it did not reach a statistically significant level in some of the regression models.
Distributed Monitoring of the R(sup 2) Statistic for Linear Regression
NASA Technical Reports Server (NTRS)
Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.
2011-01-01
The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
Multiple regression technique for Pth degree polynominals with and without linear cross products
NASA Technical Reports Server (NTRS)
Davis, J. W.
1973-01-01
A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
Robust best linear estimation for regression analysis using surrogate and instrumental variables
Wang, C. Y.
2012-01-01
We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case–control study. PMID:22285992
Linear regression approach to study amino acid digestibility in broiler chickens.
Rodehutscord, M; Kapocius, M; Timmler, R; Dieckmann, A
2004-02-01
1. An experiment was conducted to investigate whether a linear regression approach is a suitable tool for determining the amino acid (AA) digestibility up to the terminal ileum of broiler chickens. Solvent-extracted rapeseed meal (RSM) was used as the model ingredient. 2. Ten diets with 5 different inclusion rates of RSM (60, 120, 180, 240 and 300 g/kg, corresponding to crude protein concentrations from 170 to 250 g/kg in the diet), each without or with a supplementation of phytase (500 U/kg), were fed ad libitum to broiler chickens between 14 and 21 d of age. Seven pens of 12 chickens were allocated to each treatment. Digesta were sampled on a pen basis from the section of the gastrointestinal tract between Meckel's diverticulum and 2 cm anterior to the ileo-caeco-colonic junction. Titanium dioxide was included as an indigestible marker. 3. The amounts of crude protein and AAs digested up to the terminal ileum constantly increased with increasing AA intake over the entire range of intakes. When the amount of an AA digested at the terminal ileum is linearly regressed against its intake, the deviation of the slope from 1 is caused by both the unabsorbed AA from RSM and from specific endogenous losses related to RSM. These slopes varied between 0.68 and 0.88 for individual AAs, and the slopes were unaffected by phytase supplementation. 4. It is suggested that a linear regression approach be adopted to study the AA digestibility of raw materials in chickens. Digestibility determined this way does not need any correction for basal endogenous loss.
An Empirical Likelihood Method for Semiparametric Linear Regression with Right Censored Data
Fang, Kai-Tai; Li, Gang; Lu, Xuyang; Qin, Hong
2013-01-01
This paper develops a new empirical likelihood method for semiparametric linear regression with a completely unknown error distribution and right censored survival data. The method is based on the Buckley-James (1979) estimating equation. It inherits some appealing properties of the complete data empirical likelihood method. For example, it does not require variance estimation which is problematic for the Buckley-James estimator. We also extend our method to incorporate auxiliary information. We compare our method with the synthetic data empirical likelihood of Li and Wang (2003) using simulations. We also illustrate our method using Stanford heart transplantation data. PMID:23573169
Pagowski, M O; Grell, G A; Devenyi, D; Peckham, S E; McKeen, S A; Gong, W; Monache, L D; McHenry, J N; McQueen, J; Lee, P
2006-02-02
Forecasts from seven air quality models and surface ozone data collected over the eastern USA and southern Canada during July and August 2004 provide a unique opportunity to assess benefits of ensemble-based ozone forecasting and devise methods to improve ozone forecasts. In this investigation, past forecasts from the ensemble of models and hourly surface ozone measurements at over 350 sites are used to issue deterministic 24-h forecasts using a method based on dynamic linear regression. Forecasts of hourly ozone concentrations as well as maximum daily 8-h and 1-h averaged concentrations are considered. It is shown that the forecasts issued with the application of this method have reduced bias and root mean square error and better overall performance scores than any of the ensemble members and the ensemble average. Performance of the method is similar to another method based on linear regression described previously by Pagowski et al., but unlike the latter, the current method does not require measurements from multiple monitors since it operates on individual time series. Improvement in the forecasts can be easily implemented and requires minimal computational cost.
NASA Astrophysics Data System (ADS)
Deglint, Jason; Kazemzadeh, Farnoud; Wong, Alexander; Clausi, David A.
2015-09-01
One method to acquire multispectral images is to sequentially capture a series of images where each image contains information from a different bandwidth of light. Another method is to use a series of beamsplitters and dichroic filters to guide different bandwidths of light onto different cameras. However, these methods are very time consuming and expensive and perform poorly in dynamic scenes or when observing transient phenomena. An alternative strategy to capturing multispectral data is to infer this data using sparse spectral reflectance measurements captured using an imaging device with overlapping bandpass filters, such as a consumer digital camera using a Bayer filter pattern. Currently the only method of inferring dense reflectance spectra is the Wiener adaptive filter, which makes Gaussian assumptions about the data. However, these assumptions may not always hold true for all data. We propose a new technique to infer dense reflectance spectra from sparse spectral measurements through the use of a non-linear regression model. The non-linear regression model used in this technique is the random forest model, which is an ensemble of decision trees and trained via the spectral characterization of the optical imaging system and spectral data pair generation. This model is then evaluated by spectrally characterizing different patches on the Macbeth color chart, as well as by reconstructing inferred multispectral images. Results show that the proposed technique can produce inferred dense reflectance spectra that correlate well with the true dense reflectance spectra, which illustrates the merits of the technique.
NASA Astrophysics Data System (ADS)
Magar, R. B.; Jothiprakash, V.
2011-12-01
In this study, multi-linear regression (MLR) approach is used to construct intermittent reservoir daily inflow forecasting system. To illustrate the applicability and effect of using lumped and distributed input data in MLR approach, Koyna river watershed in Maharashtra, India is chosen as a case study. The results are also compared with autoregressive integrated moving average (ARIMA) models. MLR attempts to model the relationship between two or more independent variables over a dependent variable by fitting a linear regression equation. The main aim of the present study is to see the consequences of development and applicability of simple models, when sufficient data length is available. Out of 47 years of daily historical rainfall and reservoir inflow data, 33 years of data is used for building the model and 14 years of data is used for validating the model. Based on the observed daily rainfall and reservoir inflow, various types of time-series, cause-effect and combined models are developed using lumped and distributed input data. Model performance was evaluated using various performance criteria and it was found that as in the present case, of well correlated input data, both lumped and distributed MLR models perform equally well. For the present case study considered, both MLR and ARIMA models performed equally sound due to availability of large dataset.
Aboveground biomass and carbon stocks modelling using non-linear regression model
NASA Astrophysics Data System (ADS)
Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd
2016-06-01
Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.
NASA Technical Reports Server (NTRS)
Dawson, Terence P.; Curran, Paul J.; Kupiec, John A.
1995-01-01
link between wavelengths chosen by stepwise regression and the biochemical of interest, and this in turn has cast doubts on the use of imaging spectrometry for the estimation of foliar biochemical concentrations at sites distant from the training sites. To investigate this problem, an analysis was conducted on the variation in canopy biochemical concentrations and reflectance spectra using forced entry linear regression.
Kew, William; Mitchell, John B O
2015-09-01
The application of Machine Learning to cheminformatics is a large and active field of research, but there exist few papers which discuss whether ensembles of different Machine Learning methods can improve upon the performance of their component methodologies. Here we investigated a variety of methods, including kernel-based, tree, linear, neural networks, and both greedy and linear ensemble methods. These were all tested against a standardised methodology for regression with data relevant to the pharmaceutical development process. This investigation focused on QSPR problems within drug-like chemical space. We aimed to investigate which methods perform best, and how the 'wisdom of crowds' principle can be applied to ensemble predictors. It was found that no single method performs best for all problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally to outperform the simpler linear ensemble. Choice of data preprocessing methodology was found to be crucial to performance of each method too.
Gries, J M; Verotta, D
2000-08-01
In a frequently performed pharmacokinetics study, different subjects are given different doses of a drug. After each dose is given, drug concentrations are observed according to the same sampling design. The goal of the experiment is to obtain a representation for the pharmacokinetics of the drug, and to determine if drug concentrations observed at different times after a dose are linear in respect to dose. The goal of this paper is to obtain a representation for concentration as a function of time and dose, which (a) makes no assumptions on the underlying pharmacokinetics of the drug; (b) takes into account the repeated measure structure of the data; and (c) detects nonlinearities in respect to dose. To address (a) we use a multivariate adaptive regression splines representation (MARS), which we recast into a linear mixed-effects model, addressing (b). To detect nonlinearity we describe a general algorithm that obtains nested (mixed-effect) MARS representations. In the pharmacokinetics application, the algorithm obtains representations containing time, and time and dose, respectively, with the property that the bases functions of the first representation are a subset of the second. Standard statistical model selection criteria are used to select representations linear or nonlinear in respect to dose. The method can be applied to a variety of pharmacokinetics (and pharmacodynamic) preclinical and phase I-III trials. Examples of applications of the methodology to real and simulated data are reported.
Barks, C.S.
1995-01-01
Storm-runoff water-quality data were used to verify and, when appropriate, adjust regional regression models previously developed to estimate urban storm- runoff loads and mean concentrations in Little Rock, Arkansas. Data collected at 5 representative sites during 22 storms from June 1992 through January 1994 compose the Little Rock data base. Comparison of observed values (0) of storm-runoff loads and mean concentrations to the predicted values (Pu) from the regional regression models for nine constituents (chemical oxygen demand, suspended solids, total nitrogen, total ammonia plus organic nitrogen as nitrogen, total phosphorus, dissolved phosphorus, total recoverable copper, total recoverable lead, and total recoverable zinc) shows large prediction errors ranging from 63 to several thousand percent. Prediction errors for six of the regional regression models are less than 100 percent, and can be considered reasonable for water-quality models. Differences between 0 and Pu are due to variability in the Little Rock data base and error in the regional models. Where applicable, a model adjustment procedure (termed MAP-R-P) based upon regression with 0 against Pu was applied to improve predictive accuracy. For 11 of the 18 regional water-quality models, 0 and Pu are significantly correlated, that is much of the variation in 0 is explained by the regional models. Five of these 11 regional models consistently overestimate O; therefore, MAP-R-P can be used to provide a better estimate. For the remaining seven regional models, 0 and Pu are not significanfly correlated, thus neither the unadjusted regional models nor the MAP-R-P is appropriate. A simple estimator, such as the mean of the observed values may be used if the regression models are not appropriate. Standard error of estimate of the adjusted models ranges from 48 to 130 percent. Calibration results may be biased due to the limited data set sizes in the Little Rock data base. The relatively large values of
ERIC Educational Resources Information Center
Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.
2013-01-01
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
High dimensional linear regression models under long memory dependence and measurement error
NASA Astrophysics Data System (ADS)
Kaul, Abhishek
This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the
Measuring treatment and scale bias effects by linear regression in the analysis of OHI-S scores.
Moore, B J
1977-05-01
A linear regression model is presented for estimating unbiased treatment effects from OHI-S scores. An example is given to illustrate an analysis and to compare results of an unbiased regression estimator with those based on a biased simple difference estimator.
Linear regression models and k-means clustering for statistical analysis of fNIRS data.
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-02-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.
Linear regression models and k-means clustering for statistical analysis of fNIRS data
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-01-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets. PMID:25780751
Chen, Wen-Yuan; Wang, Mei; Fu, Zhou-Xing
2014-06-16
Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1) we use a terrain drop compensation (TDC) technique to solve the problem of the concavity of railway crossings; (2) we use a linear regression technique to predict the position and length of an object from image processing; (3) we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP) to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas.
Maniquiz, Marla C; Lee, Soyoung; Kim, Lee-Hyung
2010-01-01
Rainfall is an important factor in estimating the event mean concentration (EMC) which is used to quantify the washed-off pollutant concentrations from non-point sources (NPSs). Pollutant loads could also be calculated using rainfall, catchment area and runoff coefficient. In this study, runoff quantity and quality data gathered from a 28-month monitoring conducted on the road and parking lot sites in Korea were evaluated using multiple linear regression (MLR) to develop equations for estimating pollutant loads and EMCs as a function of rainfall variables. The results revealed that total event rainfall and average rainfall intensity are possible predictors of pollutant loads. Overall, the models are indicators of the high uncertainties of NPSs; perhaps estimation of EMCs and loads could be accurately obtained by means of water quality sampling or a long-term monitoring is needed to gather more data that can be used for the development of estimation models.
Chicken barn climate and hazardous volatile compounds control using simple linear regression and PID
NASA Astrophysics Data System (ADS)
Abdullah, A. H.; Bakar, M. A. A.; Shukor, S. A. A.; Saad, F. S. A.; Kamis, M. S.; Mustafa, M. H.; Khalid, N. S.
2016-07-01
The hazardous volatile compounds from chicken manure in chicken barn are potentially to be a health threat to the farm animals and workers. Ammonia (NH3) and hydrogen sulphide (H2S) produced in chicken barn are influenced by climate changes. The Electronic Nose (e-nose) is used for the barn's air, temperature and humidity data sampling. Simple Linear Regression is used to identify the correlation between temperature-humidity, humidity-ammonia and ammonia-hydrogen sulphide. MATLAB Simulink software was used for the sample data analysis using PID controller. Results shows that the performance of PID controller using the Ziegler-Nichols technique can improve the system controller to control climate in chicken barn.
2014-01-01
This paper examined the efficiency of multivariate linear regression (MLR) and artificial neural network (ANN) models in prediction of two major water quality parameters in a wastewater treatment plant. Biochemical oxygen demand (BOD) and chemical oxygen demand (COD) as well as indirect indicators of organic matters are representative parameters for sewer water quality. Performance of the ANN models was evaluated using coefficient of correlation (r), root mean square error (RMSE) and bias values. The computed values of BOD and COD by model, ANN method and regression analysis were in close agreement with their respective measured values. Results showed that the ANN performance model was better than the MLR model. Comparative indices of the optimized ANN with input values of temperature (T), pH, total suspended solid (TSS) and total suspended (TS) for prediction of BOD was RMSE = 25.1 mg/L, r = 0.83 and for prediction of COD was RMSE = 49.4 mg/L, r = 0.81. It was found that the ANN model could be employed successfully in estimating the BOD and COD in the inlet of wastewater biochemical treatment plants. Moreover, sensitive examination results showed that pH parameter have more effect on BOD and COD predicting to another parameters. Also, both implemented models have predicted BOD better than COD. PMID:24456676
Zare Abyaneh, Hamid
2014-01-01
This paper examined the efficiency of multivariate linear regression (MLR) and artificial neural network (ANN) models in prediction of two major water quality parameters in a wastewater treatment plant. Biochemical oxygen demand (BOD) and chemical oxygen demand (COD) as well as indirect indicators of organic matters are representative parameters for sewer water quality. Performance of the ANN models was evaluated using coefficient of correlation (r), root mean square error (RMSE) and bias values. The computed values of BOD and COD by model, ANN method and regression analysis were in close agreement with their respective measured values. Results showed that the ANN performance model was better than the MLR model. Comparative indices of the optimized ANN with input values of temperature (T), pH, total suspended solid (TSS) and total suspended (TS) for prediction of BOD was RMSE = 25.1 mg/L, r = 0.83 and for prediction of COD was RMSE = 49.4 mg/L, r = 0.81. It was found that the ANN model could be employed successfully in estimating the BOD and COD in the inlet of wastewater biochemical treatment plants. Moreover, sensitive examination results showed that pH parameter have more effect on BOD and COD predicting to another parameters. Also, both implemented models have predicted BOD better than COD. PMID:24456676
Zare Abyaneh, Hamid
2014-01-01
This paper examined the efficiency of multivariate linear regression (MLR) and artificial neural network (ANN) models in prediction of two major water quality parameters in a wastewater treatment plant. Biochemical oxygen demand (BOD) and chemical oxygen demand (COD) as well as indirect indicators of organic matters are representative parameters for sewer water quality. Performance of the ANN models was evaluated using coefficient of correlation (r), root mean square error (RMSE) and bias values. The computed values of BOD and COD by model, ANN method and regression analysis were in close agreement with their respective measured values. Results showed that the ANN performance model was better than the MLR model. Comparative indices of the optimized ANN with input values of temperature (T), pH, total suspended solid (TSS) and total suspended (TS) for prediction of BOD was RMSE = 25.1 mg/L, r = 0.83 and for prediction of COD was RMSE = 49.4 mg/L, r = 0.81. It was found that the ANN model could be employed successfully in estimating the BOD and COD in the inlet of wastewater biochemical treatment plants. Moreover, sensitive examination results showed that pH parameter have more effect on BOD and COD predicting to another parameters. Also, both implemented models have predicted BOD better than COD.
Predicting students' success at pre-university studies using linear and logistic regressions
NASA Astrophysics Data System (ADS)
Suliman, Noor Azizah; Abidin, Basir; Manan, Norhafizah Abdul; Razali, Ahmad Mahir
2014-09-01
The study is aimed to find the most suitable model that could predict the students' success at the medical pre-university studies, Centre for Foundation in Science, Languages and General Studies of Cyberjaya University College of Medical Sciences (CUCMS). The predictors under investigation were the national high school exit examination-Sijil Pelajaran Malaysia (SPM) achievements such as Biology, Chemistry, Physics, Additional Mathematics, Mathematics, English and Bahasa Malaysia results as well as gender and high school background factors. The outcomes showed that there is a significant difference in the final CGPA, Biology and Mathematics subjects at pre-university by gender factor, while by high school background also for Mathematics subject. In general, the correlation between the academic achievements at the high school and medical pre-university is moderately significant at α-level of 0.05, except for languages subjects. It was found also that logistic regression techniques gave better prediction models than the multiple linear regression technique for this data set. The developed logistic models were able to give the probability that is almost accurate with the real case. Hence, it could be used to identify successful students who are qualified to enter the CUCMS medical faculty before accepting any students to its foundation program.
NASA Astrophysics Data System (ADS)
Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.
2016-02-01
The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
ERIC Educational Resources Information Center
So, Tak-Shing Harry; Peng, Chao-Ying Joanne
This study compared the accuracy of predicting two-group membership obtained from K-means clustering with those derived from linear probability modeling, linear discriminant function, and logistic regression under various data properties. Multivariate normally distributed populations were simulated based on combinations of population proportions,…
Hoos, Anne B.; Patel, Anant R.
1996-01-01
Model-adjustment procedures were applied to the combined data bases of storm-runoff quality for Chattanooga, Knoxville, and Nashville, Tennessee, to improve predictive accuracy for storm-runoff quality for urban watersheds in these three cities and throughout Middle and East Tennessee. Data for 45 storms at 15 different sites (five sites in each city) constitute the data base. Comparison of observed values of storm-runoff load and event-mean concentration to the predicted values from the regional regression models for 10 constituents shows prediction errors, as large as 806,000 percent. Model-adjustment procedures, which combine the regional model predictions with local data, are applied to improve predictive accuracy. Standard error of estimate after model adjustment ranges from 67 to 322 percent. Calibration results may be biased due to sampling error in the Tennessee data base. The relatively large values of standard error of estimate for some of the constituent models, although representing significant reduction (at least 50 percent) in prediction error compared to estimation with unadjusted regional models, may be unacceptable for some applications. The user may wish to collect additional local data for these constituents and repeat the analysis, or calibrate an independent local regression model.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. PMID:26720396
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure.
Jalal, Hawre; Goldhaber-Fiebert, Jeremy D; Kuntz, Karen M
2015-07-01
Decision makers often desire both guidance on the most cost-effective interventions given current knowledge and also the value of collecting additional information to improve the decisions made (i.e., from value of information [VOI] analysis). Unfortunately, VOI analysis remains underused due to the conceptual, mathematical, and computational challenges of implementing Bayesian decision-theoretic approaches in models of sufficient complexity for real-world decision making. In this study, we propose a novel practical approach for conducting VOI analysis using a combination of probabilistic sensitivity analysis, linear regression metamodeling, and unit normal loss integral function--a parametric approach to VOI analysis. We adopt a linear approximation and leverage a fundamental assumption of VOI analysis, which requires that all sources of prior uncertainties be accurately specified. We provide examples of the approach and show that the assumptions we make do not induce substantial bias but greatly reduce the computational time needed to perform VOI analysis. Our approach avoids the need to analytically solve or approximate joint Bayesian updating, requires only one set of probabilistic sensitivity analysis simulations, and can be applied in models with correlated input parameters. PMID:25840900
Jalal, Hawre; Goldhaber-Fiebert, Jeremy D.; Kuntz, Karen M.
2016-01-01
Decision makers often desire both guidance on the most cost-effective interventions given current knowledge and also the value of collecting additional information to improve the decisions made [i.e., from value of information (VOI) analysis]. Unfortunately, VOI analysis remains underutilized due to the conceptual, mathematical and computational challenges of implementing Bayesian decision theoretic approaches in models of sufficient complexity for real-world decision making. In this study, we propose a novel practical approach for conducting VOI analysis using a combination of probabilistic sensitivity analysis, linear regression metamodeling, and unit normal loss integral function – a parametric approach to VOI analysis. We adopt a linear approximation and leverage a fundamental assumption of VOI analysis which requires that all sources of prior uncertainties be accurately specified. We provide examples of the approach and show that the assumptions we make do not induce substantial bias but greatly reduce the computational time needed to perform VOI analysis. Our approach avoids the need to analytically solve or approximate joint Bayesian updating, requires only one set of probabilistic sensitivity analysis simulations, and can be applied in models with correlated input parameters. PMID:25840900
ERIC Educational Resources Information Center
Rocconi, Louis M.
2011-01-01
Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…
Linard, Joshua I.
2013-01-01
Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.
NASA Astrophysics Data System (ADS)
Chu, A.; Zhuang, J.
2015-12-01
Wells and Coppersmith (1994) have used fault data to fit simple linear regression (SLR) models to explain linear relations between moment magnitude and logarithms of fault measurements such as rupture length, rupture width, rupture area and surface displacement. Our work extends their analyses to multiple linear regression (MLR) models by considering two or more predictors with updated data. Treating the quantitative variables (rupture length, rupture width, rupture area and surface displacement) as predictors to fit linear regression models on magnitude, we have discovered that the two-predictor model using rupture area and maximum displacement fits the best. The next best alternative predictors are surface length and rupture area. Neither slip type nor slip direction is a significant predictor by fitting of analysis of variance (ANOVA) and analysis of covariance (ANCOVA) models. Corrected Akaike information criterion (Burnham and Anderson, 2002) is used as a model assessment criterion. Comparisons between simple linear regression models of Wells and Coppersmith (1994) and our multiple linear regression models are presented. Our work is done using fault data from Wells and Coppersmith (1994) and new data from Ellswort (2000), Hanks and Bakun (2002, 2008), Shaw (2013), and Finite-Source Rupture Model Database (http://equake-rc.info/SRCMOD/, 2015).
Stratton, Kelly G; Cook, Andrea J; Jackson, Lisa A; Nelson, Jennifer C
2015-03-30
Sequential methods are well established for randomized clinical trials (RCTs), and their use in observational settings has increased with the development of national vaccine and drug safety surveillance systems that monitor large healthcare databases. Observational safety monitoring requires that sequential testing methods be better equipped to incorporate confounder adjustment and accommodate rare adverse events. New methods designed specifically for observational surveillance include a group sequential likelihood ratio test that uses exposure matching and generalized estimating equations approach that involves regression adjustment. However, little is known about the statistical performance of these methods or how they compare to RCT methods in both observational and rare outcome settings. We conducted a simulation study to determine the type I error, power and time-to-surveillance-end of group sequential likelihood ratio test, generalized estimating equations and RCT methods that construct group sequential Lan-DeMets boundaries using data from a matched (group sequential Lan-DeMets-matching) or unmatched regression (group sequential Lan-DeMets-regression) setting. We also compared the methods using data from a multisite vaccine safety study. All methods had acceptable type I error, but regression methods were more powerful, faster at detecting true safety signals and less prone to implementation difficulties with rare events than exposure matching methods. Method performance also depended on the distribution of information and extent of confounding by site. Our results suggest that choice of sequential method, especially the confounder control strategy, is critical in rare event observational settings. These findings provide guidance for choosing methods in this context and, in particular, suggest caution when conducting exposure matching.
Coelho, Lúcia H G; Gutz, Ivano G R
2006-03-15
A chemometric method for analysis of conductometric titration data was introduced to extend its applicability to lower concentrations and more complex acid-base systems. Auxiliary pH measurements were made during the titration to assist the calculation of the distribution of protonable species on base of known or guessed equilibrium constants. Conductivity values of each ionized or ionizable species possibly present in the sample were introduced in a general equation where the only unknown parameters were the total concentrations of (conjugated) bases and of strong electrolytes not involved in acid-base equilibria. All these concentrations were adjusted by a multiparametric nonlinear regression (NLR) method, based on the Levenberg-Marquardt algorithm. This first conductometric titration method with NLR analysis (CT-NLR) was successfully applied to simulated conductometric titration data and to synthetic samples with multiple components at concentrations as low as those found in rainwater (approximately 10 micromol L(-1)). It was possible to resolve and quantify mixtures containing a strong acid, formic acid, acetic acid, ammonium ion, bicarbonate and inert electrolyte with accuracy of 5% or better.
Methods for Adjusting U.S. Geological Survey Rural Regression Peak Discharges in an Urban Setting
Moglen, Glenn E.; Shivers, Dorianne E.
2006-01-01
A study was conducted of 78 U.S. Geological Survey gaged streams that have been subjected to varying degrees of urbanization over the last three decades. Flood-frequency analysis coupled with nonlinear regression techniques were used to generate a set of equations for converting peak discharge estimates determined from rural regression equations to a set of peak discharge estimates that represent known urbanization. Specifically, urban regression equations for the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year return periods were calibrated as a function of the corresponding rural peak discharge and the percentage of impervious area in a watershed. The results of this study indicate that two sets of equations, one set based on imperviousness and one set based on population density, performed well. Both sets of equations are dependent on rural peak discharges, a measure of development (average percentage of imperviousness or average population density), and a measure of homogeneity of development within a watershed. Average imperviousness was readily determined by using geographic information system methods and commonly available land-cover data. Similarly, average population density was easily determined from census data. Thus, a key advantage to the equations developed in this study is that they do not require field measurements of watershed characteristics as did the U.S. Geological Survey urban equations developed in an earlier investigation. During this study, the U.S. Geological Survey PeakFQ program was used as an integral tool in the calibration of all equations. The scarcity of historical land-use data, however, made exclusive use of flow records necessary for the 30-year period from 1970 to 2000. Such relatively short-duration streamflow time series required a nonstandard treatment of the historical data function of the PeakFQ program in comparison to published guidelines. Thus, the approach used during this investigation does not fully comply with the
NASA Astrophysics Data System (ADS)
Shu, Yuqin; Lam, Nina S. N.
2011-01-01
Detailed estimates of carbon dioxide emissions at fine spatial scales are critical to both modelers and decision makers dealing with global warming and climate change. Globally, traffic-related emissions of carbon dioxide are growing rapidly. This paper presents a new method based on a multiple linear regression model to disaggregate traffic-related CO 2 emission estimates from the parish-level scale to a 1 × 1 km grid scale. Considering the allocation factors (population density, urban area, income, road density) together, we used a correlation and regression analysis to determine the relationship between these factors and traffic-related CO 2 emissions, and developed the best-fit model. The method was applied to downscale the traffic-related CO 2 emission values by parish (i.e. county) for the State of Louisiana into 1-km 2 grid cells. In the four highest parishes in traffic-related CO 2 emissions, the biggest area that has above average CO 2 emissions is found in East Baton Rouge, and the smallest area with no CO 2 emissions is also in East Baton Rouge, but Orleans has the most CO 2 emissions per unit area. The result reveals that high CO 2 emissions are concentrated in dense road network of urban areas with high population density and low CO 2 emissions are distributed in rural areas with low population density, sparse road network. The proposed method can be used to identify the emission "hot spots" at fine scale and is considered more accurate and less time-consuming than the previous methods.
Non-linear regression model for spatial variation in precipitation chemistry for South India
NASA Astrophysics Data System (ADS)
Siva Soumya, B.; Sekhar, M.; Riotte, J.; Braun, Jean-Jacques
Chemical composition of rainwater changes from sea to inland under the influence of several major factors - topographic location of area, its distance from sea, annual rainfall. A model is developed here to quantify the variation in precipitation chemistry under the influence of inland distance and rainfall amount. Various sites in India categorized as 'urban', 'suburban' and 'rural' have been considered for model development. pH, HCO 3, NO 3 and Mg do not change much from coast to inland while, SO 4 and Ca change is subjected to local emissions. Cl and Na originate solely from sea salinity and are the chemistry parameters in the model. Non-linear multiple regressions performed for the various categories revealed that both rainfall amount and precipitation chemistry obeyed a power law reduction with distance from sea. Cl and Na decrease rapidly for the first 100 km distance from sea, then decrease marginally for the next 100 km, and later stabilize. Regression parameters estimated for different cases were found to be consistent ( R2 ˜ 0.8). Variation in one of the parameters accounted for urbanization. Model was validated using data points from the southern peninsular region of the country. Estimates are found to be within 99.9% confidence interval. Finally, this relationship between the three parameters - rainfall amount, coastline distance, and concentration (in terms of Cl and Na) was validated with experiments conducted in a small experimental watershed in the south-west India. Chemistry estimated using the model was in good correlation with observed values with a relative error of ˜5%. Monthly variation in the chemistry is predicted from a downscaling model and then compared with the observed data. Hence, the model developed for rain chemistry is useful in estimating the concentrations at different spatio-temporal scales and is especially applicable for south-west region of India.
Adjustable Permanent Quadrupoles Using Rotating Magnet Material Rods for the Next Linear Collider
James T Volk et al.
2001-09-24
The proposed Next Linear Collider (NLC) will require over 1400 adjustable quadrupoles between the main linacs' accelerator structures. These 12.7 mm bore quadrupoles will have a range of integrated strength from 0.6 to 132 Tesla, with a maximum gradient of 135 Tesla per meter, an adjustment range of +0-20% and effective lengths from 324 mm to 972 mm. The magnetic center must remain stable to within 1 micrometer during the 20% adjustment. In an effort to reduce estimated costs and increase reliability, several designs using hybrid permanent magnets have been developed. All magnets have iron poles and use either Samarium Cobalt or Neodymium Iron to provide the magnetic fields. Two prototypes use rotating rods containing permanent magnetic material to vary the gradient. Gradient changes of 20% and center shifts of less than 20 microns have been measured. These data are compared to an equivalent electromagnet prototype.
Turlapaty, Anish C.; Younan, Nicolas H.; Anantharaj, Valentine G
2012-01-01
Currently, the only viable option for a global precipitation product is the merger of several precipitation products from different modalities. In this article, we develop a linear merging methodology based on spatiotemporal regression. Four highresolution precipitation products (HRPPs), obtained through methods including the Climate Prediction Center's Morphing (CMORPH), Geostationary Operational Environmental Satellite-Based Auto-Estimator (GOES-AE), GOES-Based Hydro-Estimator (GOES-HE) and Self-Calibrating Multivariate Precipitation Retrieval (SCAMPR) algorithms, are used in this study. The merged data are evaluated against the Arkansas Red Basin River Forecast Center's (ABRFC's) ground-based rainfall product. The evaluation is performed using the Heidke skill score (HSS) for four seasons, from summer 2007 to spring 2008, and for two different rainfall detection thresholds. It is shown that the merged data outperform all the other products in seven out of eight cases. A key innovation of this machine learning method is that only 6% of the validation data are used for the initial training. The sensitivity of the algorithm to location, distribution of training data, selection of input data sets and seasons is also analysed and presented.
Prediction of peptide retention at different HPLC conditions from multiple linear regression models.
Baczek, Tomasz; Wiczling, Paweł; Marszałł, Michał; Heyden, Yvan Vander; Kaliszan, Roman
2005-01-01
To quantitatively characterize the structure of a peptide and to predict its gradient retention time at given HPLC conditions three structural descriptors are used: (i) logarithm of the sum of retention times of the amino acids composing the peptide, log SumAA, (ii) logarithm of the van der Waals volume of the peptide, log VDW(Vol), (iii) and the logarithm of the peptide's calculated n-octanol-water partition coefficient, clog P. The log SumAA descriptor is obtained from empirical data for 20 natural amino acids, determined in a given HPLC system. The two other descriptors are calculated from the peptides' structural formulas using molecular modeling methods. The quantitative structure-retention relationships (QSRR), build by multiple linear regression, describe HPLC retention of peptide on a given chromatographic system on which the retention of the 20 amino acids was predetermined. A structurally diversified series of 98 peptides was employed. The predicted gradient retention times on several chromatographic systems were in good agreement with the experimental data. The QSRR equations, derived for a given system operated at variable gradient times and temperatures allowed for the prediction of peptide retention in that system. Matching the experimental HPLC retention to the theoretically predicted for a presumed peptide could facilitate original protein identification in proteomics. In conjunction with MS data, prediction of the retention time for a given peptide might be used to improve the confidence of peptide identifications and to increase the number of correctly identified peptides.
Fuzzy clustering and soft switching of linear regression models for reversible image compression
NASA Astrophysics Data System (ADS)
Aiazzi, Bruno; Alba, Pasquale S.; Alparone, Luciano; Baronti, Stefano
1998-10-01
This paper describes an original application of fuzzy logic to reversible compression of 2D and 3D data. The compression method consists of a space-variant prediction followed by context- based classification ad arithmetic coding of the outcome residuals. Prediction of a pixel to be encoded is obtained from the fuzzy-switching of a set of linear regression predictors. The coefficients of each predictor are calculated so as to minimize prediction MSE for those pixels whose graylevel patterns, lying on a causal neighborhood of prefixed shape, are vectors belonging in a fuzzy sense to one cluster. In the 3D case, pixels both on the current slice and on previously encoded slices may be used. The size and shape of the causal neighborhood, as well as the number of predictors to be switched, may be chosen before running the algorithm and determine the trade-off between coding performance sand computational cost. The method exhibits impressive performances, for both 2D and 3D data, mainly thanks to the optimality of predictors, due to their skill in fitting data patterns.
León, Larry F; Cai, Tianxi
2012-04-01
In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of "robust" residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as "robust" censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.
Health-risk-based groundwater remediation system optimization through clusterwise linear regression.
He, L; Huang, G H; Lu, H W
2008-12-15
This study develops a health-risk-based groundwater management (HRGM) model. The model incorporates the considerations of environmental quality and human health risks into a general framework. To solve the model, a proxy-based optimization approach is proposed, where a semiparametric statistical method (i.e., clusterwise linear regression) is used to create a set of rapid-response and easy-to-use proxy modules for capturing the relations between remediation policies and the resulting human health risks. Through replacing the simulation and health risk assessment modules with the proxy ones, many orders of magnitude of computational cost can be saved. The model solutions reveal that (i) a long remediation period corresponds to a low total pumping rate, (ii) a stringent risk standard implies a high total pumping rate, and (iii) the human health risk associated with benzene would be significantly reduced if it is regarded as constraints of the model. These implications would assist decision makers in understanding the effects of remediation duration and human-health risk level on optimal remediation policies and in designing a robust groundwater remediation system. Results from postoptimization simulation show that the carcinogenic risk would decrease to satisfy the regulated risk standard under the given remediation policies.
A Fast Incremental Learning for Radial Basis Function Networks Using Local Linear Regression
NASA Astrophysics Data System (ADS)
Ozawa, Seiichi; Okamoto, Keisuke
To avoid the catastrophic interference in incremental learning, we have proposed Resource Allocating Network with Long Term Memory (RAN-LTM). In RAN-LTM, not only new training data but also some memory items stored in long-term memory are trained either by a gradient descent algorithm or by solving a linear regression problem. In the latter approach, radial basis function (RBF) centers are not trained but selected based on output errors when connection weights are updated. The proposed incremental learning algorithm belongs to the latter approach where the errors not only for a training data but also for several retrieved memory items and pseudo training data are minimized to suppress the catastrophic interference. The novelty of the proposed algorithm is that connection weights to be learned are restricted based on RBF activation in order to improve the efficiency in learning time and memory size. We evaluate the performance of the proposed algorithm in one-dimensional and multi-dimensional function approximation problems in terms of approximation accuracy, learning time, and average memory size. The experimental results demonstrate that the proposed algorithm can learn fast and have good performance with less memory size compared to memory-based learning methods.
Optimization of end-members used in multiple linear regression geochemical mixing models
NASA Astrophysics Data System (ADS)
Dunlea, Ann G.; Murray, Richard W.
2015-11-01
Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).
NASA Astrophysics Data System (ADS)
Fushimi, Akihiro; Kawashima, Hiroto; Kajihara, Hideo
Understanding the contribution of each emission source of air pollutants to ambient concentrations is important to establish effective measures for risk reduction. We have developed a source apportionment method based on an atmospheric dispersion model and multiple linear regression analysis (MLR) in conjunction with ambient concentrations simultaneously measured at points in a grid network. We used a Gaussian plume dispersion model developed by the US Environmental Protection Agency called the Industrial Source Complex model (ISC) in the method. Our method does not require emission amounts or source profiles. The method was applied to the case of benzene in the vicinity of the Keiyo Central Coastal Industrial Complex (KCCIC), one of the biggest industrial complexes in Japan. Benzene concentrations were simultaneously measured from December 2001 to July 2002 at sites in a grid network established in the KCCIC and the surrounding residential area. The method was used to estimate benzene emissions from the factories in the KCCIC and from automobiles along a section of a road, and then the annual average contribution of the KCCIC to the ambient concentrations was estimated based on the estimated emissions. The estimated contributions of the KCCIC were 65% inside the complex, 49% at 0.5-km sites, 35% at 1.5-km sites, 20% at 3.3-km sites, and 9% at a 5.6-km site. The estimated concentrations agreed well with the measured values. The estimated emissions from the factories and the road were slightly larger than those reported in the first Pollutant Release and Transfer Register (PRTR). These results support the reliability of our method. This method can be applied to other chemicals or regions to achieve reasonable source apportionments.
Fisher, Charles K.; Mehta, Pankaj
2014-01-01
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called “errors-in-variables”. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human
Fisher, Charles K; Mehta, Pankaj
2014-01-01
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called "errors-in-variables". Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct "keystone species", Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut
Asquith, William H.; Roussel, Meghan C.
2009-01-01
Annual peak-streamflow frequency estimates are needed for flood-plain management; for objective assessment of flood risk; for cost-effective design of dams, levees, and other flood-control structures; and for design of roads, bridges, and culverts. Annual peak-streamflow frequency represents the peak streamflow for nine recurrence intervals of 2, 5, 10, 25, 50, 100, 200, 250, and 500 years. Common methods for estimation of peak-streamflow frequency for ungaged or unmonitored watersheds are regression equations for each recurrence interval developed for one or more regions; such regional equations are the subject of this report. The method is based on analysis of annual peak-streamflow data from U.S. Geological Survey streamflow-gaging stations (stations). Beginning in 2007, the U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, began a 3-year investigation concerning the development of regional equations to estimate annual peak-streamflow frequency for undeveloped watersheds in Texas. The investigation focuses primarily on 638 stations with 8 or more years of data from undeveloped watersheds and other criteria. The general approach is explicitly limited to the use of L-moment statistics, which are used in conjunction with a technique of multi-linear regression referred to as PRESS minimization. The approach used to develop the regional equations, which was refined during the investigation, is referred to as the 'L-moment-based, PRESS-minimized, residual-adjusted approach'. For the approach, seven unique distributions are fit to the sample L-moments of the data for each of 638 stations and trimmed means of the seven results of the distributions for each recurrence interval are used to define the station specific, peak-streamflow frequency. As a first iteration of regression, nine weighted-least-squares, PRESS-minimized, multi-linear regression equations are computed using the watershed
NASA Astrophysics Data System (ADS)
Tian, J. J.; Yao, Y.
2011-03-01
We report an experimental demonstration of muliwavelength erbium-doped fiber laser with adjustable wavelength number based on a power-symmetric nonlinear optical loop mirror (NOLM) in a linear cavity. The intensity-dependent loss (IDL) induced by the NOLM is used to suppress the mode competition and realize the stable multiwavelength oscillation. The controlling of the wavelength number is achieved by adjusting the strength of IDL, which is dependent on the pump power. As the pump power increases from 40 to 408 mW, 1-7 lasing line(s) at fixed wavelength around 1601 nm are obtained. The output power stability is also investigated. The most power fluctuation of single wavelength is less than 0.9 dB, when the wavelength number is increased from 1-7.
Technology Transfer Automated Retrieval System (TEKTRAN)
Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...
An Investigation of the Median-Median Method of Linear Regression
ERIC Educational Resources Information Center
Walters, Elizabeth J.; Morrell, Christopher H.; Auer, Richard E.
2006-01-01
Least squares regression is the most common method of fitting a straight line to a set of bivariate data. Another less known method that is available on Texas Instruments graphing calculators is median-median regression. This method is proposed as a simple method that may be used with middle and high school students to motivate the idea of fitting…
A New Test of Linear Hypotheses in OLS Regression under Heteroscedasticity of Unknown Form
ERIC Educational Resources Information Center
Cai, Li; Hayes, Andrew F.
2008-01-01
When the errors in an ordinary least squares (OLS) regression model are heteroscedastic, hypothesis tests involving the regression coefficients can have Type I error rates that are far from the nominal significance level. Asymptotically, this problem can be rectified with the use of a heteroscedasticity-consistent covariance matrix (HCCM)…
Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression
ERIC Educational Resources Information Center
Beckstead, Jason W.
2012-01-01
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…
Confidence Intervals for an Effect Size Measure in Multiple Linear Regression
ERIC Educational Resources Information Center
Algina, James; Keselman, H. J.; Penfield, Randall D.
2007-01-01
The increase in the squared multiple correlation coefficient ([Delta]R[squared]) associated with a variable in a regression equation is a commonly used measure of importance in regression analysis. The coverage probability that an asymptotic and percentile bootstrap confidence interval includes [Delta][rho][squared] was investigated. As expected,…
Ho Hoang, Khai-Long; Mombaur, Katja
2015-10-15
Dynamic modeling of the human body is an important tool to investigate the fundamentals of the biomechanics of human movement. To model the human body in terms of a multi-body system, it is necessary to know the anthropometric parameters of the body segments. For young healthy subjects, several data sets exist that are widely used in the research community, e.g. the tables provided by de Leva. None such comprehensive anthropometric parameter sets exist for elderly people. It is, however, well known that body proportions change significantly during aging, e.g. due to degenerative effects in the spine, such that parameters for young people cannot be used for realistically simulating the dynamics of elderly people. In this study, regression equations are derived from the inertial parameters, center of mass positions, and body segment lengths provided by de Leva to be adjustable to the changes in proportion of the body parts of male and female humans due to aging. Additional adjustments are made to the reference points of the parameters for the upper body segments as they are chosen in a more practicable way in the context of creating a multi-body model in a chain structure with the pelvis representing the most proximal segment.
Ho Hoang, Khai-Long; Mombaur, Katja
2015-10-15
Dynamic modeling of the human body is an important tool to investigate the fundamentals of the biomechanics of human movement. To model the human body in terms of a multi-body system, it is necessary to know the anthropometric parameters of the body segments. For young healthy subjects, several data sets exist that are widely used in the research community, e.g. the tables provided by de Leva. None such comprehensive anthropometric parameter sets exist for elderly people. It is, however, well known that body proportions change significantly during aging, e.g. due to degenerative effects in the spine, such that parameters for young people cannot be used for realistically simulating the dynamics of elderly people. In this study, regression equations are derived from the inertial parameters, center of mass positions, and body segment lengths provided by de Leva to be adjustable to the changes in proportion of the body parts of male and female humans due to aging. Additional adjustments are made to the reference points of the parameters for the upper body segments as they are chosen in a more practicable way in the context of creating a multi-body model in a chain structure with the pelvis representing the most proximal segment. PMID:26338096
Worachartcheewan, Apilak; Nantasenamat, Chanin; Owasirikul, Wiwat; Monnor, Teerawat; Naruepantawart, Orapan; Janyapaisarn, Sayamon; Prachayasittikul, Supaluk; Prachayasittikul, Virapong
2014-02-12
A data set of 1-adamantylthiopyridine analogs (1-19) with antioxidant activity, comprising of 2,2-diphenyl-1-picrylhydrazyl (DPPH) and superoxide dismutase (SOD) activities, was used for constructing quantitative structure-activity relationship (QSAR) models. Molecular structures were geometrically optimized at B3LYP/6-31g(d) level and subjected for further molecular descriptor calculation using Dragon software. Multiple linear regression (MLR) was employed for the development of QSAR models using 3 significant descriptors (i.e. Mor29e, F04[N-N] and GATS5v) for predicting the DPPH activity and 2 essential descriptors (i.e. EEig06r and Mor06v) for predicting the SOD activity. Such molecular descriptors accounted for the effects and positions of substituent groups (R) on the 1-adamantylthiopyridine ring. The results showed that high atomic electronegativity of polar substituent group (R = CO2H) afforded high DPPH activity, while substituent with high atomic van der Waals volumes such as R = Br gave high SOD activity. Leave-one-out cross-validation (LOO-CV) and external test set were used for model validation. Correlation coefficient (QCV) and root mean squared error (RMSECV) of the LOO-CV set for predicting DPPH activity were 0.5784 and 8.3440, respectively, while QExt and RMSEExt of external test set corresponded to 0.7353 and 4.2721, respectively. Furthermore, QCV and RMSECV values of the LOO-CV set for predicting SOD activity were 0.7549 and 5.6380, respectively. The QSAR model's equation was then used in predicting the SOD activity of tested compounds and these were subsequently verified experimentally. It was observed that the experimental activity was more potent than the predicted activity. Structure-activity relationships of significant descriptors governing antioxidant activity are also discussed. The QSAR models investigated herein are anticipated to be useful in the rational design and development of novel compounds with antioxidant activity. PMID
Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D
2015-05-01
Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical
Hu, L.; Zhang, Z.G.; Mouraux, A.; Iannetti, G.D.
2015-01-01
Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical
Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D
2015-05-01
Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical
NASA Technical Reports Server (NTRS)
Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.
2010-01-01
The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.
Chen, Jinbo; Yu, Kai; Hsing, Ann; Therneau, Terry M
2007-04-01
The success of genetic dissection of complex diseases may greatly benefit from judicious exploration of joint gene effects, which, in turn, critically depends on the power of statistical tools. Standard regression models are convenient for assessing main effects and low-order gene-gene interactions but not for exploring complex higher-order interactions. Tree-based methodology is an attractive alternative for disentangling possible interactions, but it has difficulty in modeling additive main effects. This work proposes a new class of semiparametric regression models, termed partially linear tree-based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models. A PLTR model quantifies joint effects of genes and other risk factors by a combination of linear main effects and a non-parametric tree -structure. We propose an iterative algorithm to fit the PLTR model, and a unified resampling approach for identifying and testing the significance of the optimal "pruned" tree nested within the tree resultant from the fitting algorithm. Simulation studies showed that the resampling procedure maintained the correct type I error rate. We applied the PLTR model to assess the association between biliary stone risk and 53 single nucleotide polymorphisms (SNPs) in the inflammation pathway in a population-based case-control study. The analysis yielded an interesting parsimonious summary of the joint effect of all SNPs. The proposed model is also useful for exploring gene-environment interactions and has broad implications for applying the tree methodology to genetic epidemiology research.
Arora, Amarpreet S; Reddy, Akepati S
2014-01-01
Stormwater management at urban sub-watershed level has been envisioned to include stormwater collection, treatment, and disposal of treated stormwater through groundwater recharging. Sizing, operation and control of the stormwater management systems require information on the quantities and characteristics of the stormwater generated. Stormwater characteristics depend upon dry spell between two successive rainfall events, intensity of rainfall and watershed characteristics. However, sampling and analysis of stormwater, spanning only few rainfall events, provides insufficient information on the characteristics. An attempt has been made in the present study to assess the stormwater characteristics through regression modeling. Stormwater of five sub-watersheds of Patiala city were sampled and analyzed. The results obtained were related with the antecedent dry periods and with the intensity of the rainfall event through regression modeling. Obtained regression models were used to assess the stormwater quality for various antecedent dry periods and rainfall event intensities.
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1975-01-01
Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Yang, Xiwu; Wang, Tianming
2013-11-21
Originating from sequences' length difference, both k-word based methods and graphical representation approaches have uncovered biological information in their distinct ways. However, it is less likely that the mechanisms of information storage vary with sequences' length. A similarity distance suitable for sequences with various lengths will be much near to the mechanisms of information storage. In this paper, new sub-sequences of k-word were extracted from biological sequences under a one-to-one mapping. The new sub-sequences were evaluated by a linear regression model. Moreover, a new distance was defined on the invariants from the linear regression model. With comparison to other alignment-free distances, the results of four experiments demonstrated that our similarity distance was more efficient.
ERIC Educational Resources Information Center
Baker, Bruce D.; Richards, Craig E.
1999-01-01
Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
Bertrand-Krajewski, J L
2004-01-01
In order to replace traditional sampling and analysis techniques, turbidimeters can be used to estimate TSS concentration in sewers, by means of sensor and site specific empirical equations established by linear regression of on-site turbidity Tvalues with TSS concentrations C measured in corresponding samples. As the ordinary least-squares method is not able to account for measurement uncertainties in both T and C variables, an appropriate regression method is used to solve this difficulty and to evaluate correctly the uncertainty in TSS concentrations estimated from measured turbidity. The regression method is described, including detailed calculations of variances and covariance in the regression parameters. An example of application is given for a calibrated turbidimeter used in a combined sewer system, with data collected during three dry weather days. In order to show how the established regression could be used, an independent 24 hours long dry weather turbidity data series recorded at 2 min time interval is used, transformed into estimated TSS concentrations, and compared to TSS concentrations measured in samples. The comparison appears as satisfactory and suggests that turbidity measurements could replace traditional samples. Further developments, including wet weather periods and other types of sensors, are suggested.
Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression.
Beckstead, Jason W
2012-03-30
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic strategy to isolate, examine, and remove suppression effects has been offered. In this article such an approach, rooted in confirmatory factor analysis theory and employing matrix algebra, is developed. Suppression is viewed as the result of criterion-irrelevant variance operating among predictors. Decomposition of predictor variables into criterion-relevant and criterion-irrelevant components using structural equation modeling permits derivation of regression weights with the effects of criterion-irrelevant variance omitted. Three examples with data from applied research are used to illustrate the approach: the first assesses child and parent characteristics to explain why some parents of children with obsessive-compulsive disorder accommodate their child's compulsions more so than do others, the second examines various dimensions of personal health to explain individual differences in global quality of life among patients following heart surgery, and the third deals with quantifying the relative importance of various aptitudes for explaining academic performance in a sample of nursing students. The approach is offered as an analytic tool for investigators interested in understanding predictor-criterion relationships when complex patterns of intercorrelation among predictors are present and is shown to augment dominance analysis.
Creating a non-linear total sediment load formula using polynomial best subset regression model
NASA Astrophysics Data System (ADS)
Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali
2016-08-01
The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.
ERIC Educational Resources Information Center
Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael
2011-01-01
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Regression Is a Univariate General Linear Model Subsuming Other Parametric Methods as Special Cases.
ERIC Educational Resources Information Center
Vidal, Sherry
Although the concept of the general linear model (GLM) has existed since the 1960s, other univariate analyses such as the t-test and the analysis of variance models have remained popular. The GLM produces an equation that minimizes the mean differences of independent variables as they are related to a dependent variable. From a computer printout…
The rotational feedback on linear-momentum balance in glacial isostatic adjustment
NASA Astrophysics Data System (ADS)
Martinec, Zdenek; Hagedoorn, Jan
2015-04-01
The influence of changes in surface ice-mass redistribution and associated viscoelastic response of the Earth, known as glacial-isostatic adjustment (GIA), on the Earth's rotational dynamics has long been known. Equally important is the effect of the changes in the rotational dynamics on the viscoelastic deformation of the Earth. This signal, known as the rotational feedback, or more precisely, the rotational feedback on the sea-level equation, has been mathematically described by the sea-level equation extended for the term that is proportional to perturbation in the centrifugal potential and the second-degree tidal Love number. The perturbation in the centrifugal force due to changes in the Earth's rotational dynamics enters not only into the sea-level equation, but also into the conservation law of linear momentum such that the internal viscoelastic force, the perturbation in the gravitational force and the perturbation in the centrifugal force are in balance. Adding the centrifugal-force perturbation to the linear-momentum balance creates an additional rotational feedback on the viscoelastic deformations of the Earth. We term this feedback mechanism as the rotational feedback on the linear-momentum balance. We extend both the time-domain method for modelling the GIA response of laterally heterogeneous earth models and the traditional Laplace-domain method for modelling the GIA-induced rotational response to surface loading by considering the rotational feedback on linear-momentum balance. The correctness of the mathematical extensions of the methods is validated numerically by comparing the polar motion response to the GIA process and the rotationally-induced degree 2 and order 1 spherical harmonic component of the surface vertical displacement and gravity field. We present the difference between the case where the rotational feedback on linear-momentum balance is considered against that where it is not. Numerical simulations show that the resulting difference
The rotational feedback on linear-momentum balance in glacial isostatic adjustment
NASA Astrophysics Data System (ADS)
Martinec, Zdeněk; Hagedoorn, Jan
2014-12-01
The influence of changes in surface ice-mass redistribution and associated viscoelastic response of the Earth, known as glacial isostatic adjustment (GIA), on the Earth's rotational dynamics has long been known. Equally important is the effect of the changes in the rotational dynamics on the viscoelastic deformation of the Earth. This signal, known as the rotational feedback, or more precisely, the rotational feedback on the sea level equation, has been mathematically described by the sea level equation extended for the term that is proportional to perturbation in the centrifugal potential and the second-degree tidal Love number. The perturbation in the centrifugal force due to changes in the Earth's rotational dynamics enters not only into the sea level equation, but also into the conservation law of linear momentum such that the internal viscoelastic force, the perturbation in the gravitational force and the perturbation in the centrifugal force are in balance. Adding the centrifugal-force perturbation to the linear-momentum balance creates an additional rotational feedback on the viscoelastic deformations of the Earth. We term this feedback mechanism, which is studied in this paper, as the rotational feedback on the linear-momentum balance. We extend both the time-domain method for modelling the GIA response of laterally heterogeneous earth models developed by Martinec and the traditional Laplace-domain method for modelling the GIA-induced rotational response to surface loading by considering the rotational feedback on linear-momentum balance. The correctness of the mathematical extensions of the methods is validated numerically by comparing the polar-motion response to the GIA process and the rotationally induced degree 2 and order 1 spherical harmonic component of the surface vertical displacement and gravity field. We present the difference between the case where the rotational feedback on linear-momentum balance is considered against that where it is not
Application of Dynamic Grey-Linear Auto-regressive Model in Time Scale Calculation
NASA Astrophysics Data System (ADS)
Yuan, H. T.; Don, S. W.
2009-01-01
Because of the influence of different noise and the other factors, the running of an atomic clock is very complex. In order to forecast the velocity of an atomic clock accurately, it is necessary to study and design a model to calculate its velocity in the near future. By using the velocity, the clock could be used in the calculation of local atomic time and the steering of local universal time. In this paper, a new forecast model called dynamic grey-liner auto-regressive model is studied, and the precision of the new model is given. By the real data of National Time Service Center, the new model is tested.
NASA Astrophysics Data System (ADS)
Shortridge, J.; Guikema, S.; Zaitchik, B. F.
2015-12-01
In the past decade, machine-learning methods for empirical rainfall-runoff modeling have seen extensive development. However, the majority of research has focused on a small number of methods, such as artificial neural networks, while not considering other approaches for non-parametric regression that have been developed in recent years. These methods may be able to achieve comparable predictive accuracy to ANN's and more easily provide physical insights into the system of interest through evaluation of covariate influence. Additionally, these methods could provide a straightforward, computationally efficient way of evaluating climate change impacts in basins where data to support physical hydrologic models is limited. In this paper, we use multiple regression and machine-learning approaches to predict monthly streamflow in five highly-seasonal rivers in the highlands of Ethiopia. We find that generalized additive models, random forests, and cubist models achieve better predictive accuracy than ANNs in many basins assessed and are also able to outperform physical models developed for the same region. We discuss some challenges that could hinder the use of such models for climate impact assessment, such as biases resulting from model formulation and prediction under extreme climate conditions, and suggest methods for preventing and addressing these challenges. Finally, we demonstrate how predictor variable influence can be assessed to provide insights into the physical functioning of data-sparse watersheds.
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed. PMID:27653817
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The
Schilling, K.E.; Wolter, C.F.
2005-01-01
Nineteen variables, including precipitation, soils and geology, land use, and basin morphologic characteristics, were evaluated to develop Iowa regression models to predict total streamflow (Q), base flow (Qb), storm flow (Qs) and base flow percentage (%Qb) in gauged and ungauged watersheds in the state. Discharge records from a set of 33 watersheds across the state for the 1980 to 2000 period were separated into Qb and Qs. Multiple linear regression found that 75.5 percent of long term average Q was explained by rainfall, sand content, and row crop percentage variables, whereas 88.5 percent of Qb was explained by these three variables plus permeability and floodplain area variables. Qs was explained by average rainfall and %Qb was a function of row crop percentage, permeability, and basin slope variables. Regional regression models developed for long term average Q and Qb were adapted to annual rainfall and showed good correlation between measured and predicted values. Combining the regression model for Q with an estimate of mean annual nitrate concentration, a map of potential nitrate loads in the state was produced. Results from this study have important implications for understanding geomorphic and land use controls on streamflow and base flow in Iowa watersheds and similar agriculture dominated watersheds in the glaciated Midwest. (JAWRA) (Copyright ?? 2005).
Theobald, Roddy; Freeman, Scott
2014-01-01
Although researchers in undergraduate science, technology, engineering, and mathematics education are currently using several methods to analyze learning gains from pre- and posttest data, the most commonly used approaches have significant shortcomings. Chief among these is the inability to distinguish whether differences in learning gains are due to the effect of an instructional intervention or to differences in student characteristics when students cannot be assigned to control and treatment groups at random. Using pre- and posttest scores from an introductory biology course, we illustrate how the methods currently in wide use can lead to erroneous conclusions, and how multiple linear regression offers an effective framework for distinguishing the impact of an instructional intervention from the impact of student characteristics on test score gains. In general, we recommend that researchers always use student-level regression models that control for possible differences in student ability and preparation to estimate the effect of any nonrandomized instructional intervention on student performance.
Theobald, Roddy; Freeman, Scott
2014-01-01
Although researchers in undergraduate science, technology, engineering, and mathematics education are currently using several methods to analyze learning gains from pre- and posttest data, the most commonly used approaches have significant shortcomings. Chief among these is the inability to distinguish whether differences in learning gains are due to the effect of an instructional intervention or to differences in student characteristics when students cannot be assigned to control and treatment groups at random. Using pre- and posttest scores from an introductory biology course, we illustrate how the methods currently in wide use can lead to erroneous conclusions, and how multiple linear regression offers an effective framework for distinguishing the impact of an instructional intervention from the impact of student characteristics on test score gains. In general, we recommend that researchers always use student-level regression models that control for possible differences in student ability and preparation to estimate the effect of any nonrandomized instructional intervention on student performance. PMID:24591502
Theobald, Roddy; Freeman, Scott
2014-01-01
Although researchers in undergraduate science, technology, engineering, and mathematics education are currently using several methods to analyze learning gains from pre- and posttest data, the most commonly used approaches have significant shortcomings. Chief among these is the inability to distinguish whether differences in learning gains are due to the effect of an instructional intervention or to differences in student characteristics when students cannot be assigned to control and treatment groups at random. Using pre- and posttest scores from an introductory biology course, we illustrate how the methods currently in wide use can lead to erroneous conclusions, and how multiple linear regression offers an effective framework for distinguishing the impact of an instructional intervention from the impact of student characteristics on test score gains. In general, we recommend that researchers always use student-level regression models that control for possible differences in student ability and preparation to estimate the effect of any nonrandomized instructional intervention on student performance. PMID:24591502
NASA Technical Reports Server (NTRS)
Lo, Ching F.
1999-01-01
The integration of Radial Basis Function Networks and Back Propagation Neural Networks with the Multiple Linear Regression has been accomplished to map nonlinear response surfaces over a wide range of independent variables in the process of the Modem Design of Experiments. The integrated method is capable to estimate the precision intervals including confidence and predicted intervals. The power of the innovative method has been demonstrated by applying to a set of wind tunnel test data in construction of response surface and estimation of precision interval.
Lewin, M.D.; Sarasua, S.; Jones, P.A. . Div. of Health Studies)
1999-07-01
For the purpose of examining the association between blood lead levels and household-specific soil lead levels, the authors used a multivariate linear regression model to find a slope factor relating soil lead levels to blood lead levels. They used previously collected data from the Agency for Toxic Substances and Disease Registry's (ATSDR's) multisite lead and cadmium study. The data included in the blood lead measurements of 1,015 children aged 6--71 months, and corresponding household-specific environmental samples. The environmental samples included lead in soil, house dust, interior paint, and tap water. After adjusting for income, education or the parents, presence of a smoker in the household, sex, and dust lead, and using a double log transformation, they found a slope factor of 0.1388 with a 95% confidence interval of 0.09--0.19 for the dose-response relationship between the natural log of the soil lead level and the natural log of the blood lead level. The predicted blood lead level corresponding to a soil lead level of 500 mg/kg was 5.99 [micro]g/kg with a 95% prediction interval of 2.08--17.29. Predicted values and their corresponding prediction intervals varied by covariate level. The model shows that increased soil lead level is associated with elevated blood leads in children, but that predictions based on this regression model are subject to high levels of uncertainty and variability.
Liu, Yan; Salvendy, Gavriel
2009-05-01
This paper aims to demonstrate the effects of measurement errors on psychometric measurements in ergonomics studies. A variety of sources can cause random measurement errors in ergonomics studies and these errors can distort virtually every statistic computed and lead investigators to erroneous conclusions. The effects of measurement errors on five most widely used statistical analysis tools have been discussed and illustrated: correlation; ANOVA; linear regression; factor analysis; linear discriminant analysis. It has been shown that measurement errors can greatly attenuate correlations between variables, reduce statistical power of ANOVA, distort (overestimate, underestimate or even change the sign of) regression coefficients, underrate the explanation contributions of the most important factors in factor analysis and depreciate the significance of discriminant function and discrimination abilities of individual variables in discrimination analysis. The discussions will be restricted to subjective scales and survey methods and their reliability estimates. Other methods applied in ergonomics research, such as physical and electrophysiological measurements and chemical and biomedical analysis methods, also have issues of measurement errors, but they are beyond the scope of this paper. As there has been increasing interest in the development and testing of theories in ergonomics research, it has become very important for ergonomics researchers to understand the effects of measurement errors on their experiment results, which the authors believe is very critical to research progress in theory development and cumulative knowledge in the ergonomics field.
Stevens, F. J.; Bobrovnik, S. A.; Biosciences Division; Palladin Inst. Biochemistry
2007-12-01
Physiological responses of the adaptive immune system are polyclonal in nature whether induced by a naturally occurring infection, by vaccination to prevent infection or, in the case of animals, by challenge with antigen to generate reagents of research or commercial significance. The composition of the polyclonal responses is distinct to each individual or animal and changes over time. Differences exist in the affinities of the constituents and their relative proportion of the responsive population. In addition, some of the antibodies bind to different sites on the antigen, whereas other pairs of antibodies are sterically restricted from concurrent interaction with the antigen. Even if generation of a monoclonal antibody is the ultimate goal of a project, the quality of the resulting reagent is ultimately related to the characteristics of the initial immune response. It is probably impossible to quantitatively parse the composition of a polyclonal response to antigen. However, molecular regression allows further parameterization of a polyclonal antiserum in the context of certain simplifying assumptions. The antiserum is described as consisting of two competing populations of high- and low-affinity and unknown relative proportions. This simple model allows the quantitative determination of representative affinities and proportions. These parameters may be of use in evaluating responses to vaccines, to evaluating continuity of antibody production whether in vaccine recipients or animals used for the production of antisera, or in optimizing selection of donors for the production of monoclonal antibodies.
NASA Technical Reports Server (NTRS)
Barrett, C. A.
1985-01-01
Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.
NASA Astrophysics Data System (ADS)
Eghnam, Karam M.; Sheta, Alaa F.
2008-06-01
Development of accurate models is necessary in critical applications such as prediction. In this paper, a solution to the stock prediction problem of the Barents Sea capelin is introduced using Artificial Neural Network (ANN) and Multiple Linear model Regression (MLR) models. The Capelin stock in the Barents Sea is one of the largest in the world. It normally maintained a fishery with annual catches of up to 3 million tons. The Capelin stock problem has an impact in the fish stock development. The proposed prediction model was developed using an ANNs with their weights adapted using Genetic Algorithm (GA). The proposed model was compared to traditional linear model the MLR. The results showed that the ANN-GA model produced an overall accuracy of 21% better than the MLR model.
Adjusting for Health Status in Non-Linear Models of Health Care Disparities
Cook, Benjamin L.; McGuire, Thomas G.; Meara, Ellen; Zaslavsky, Alan M.
2009-01-01
This article compared conceptual and empirical strengths of alternative methods for estimating racial disparities using non-linear models of health care access. Three methods were presented (propensity score, rank and replace, and a combined method) that adjust for health status while allowing SES variables to mediate the relationship between race and access to care. Applying these methods to a nationally representative sample of blacks and non-Hispanic whites surveyed in the 2003 and 2004 Medical Expenditure Panel Surveys (MEPS), we assessed the concordance of each of these methods with the Institute of Medicine (IOM) definition of racial disparities, and empirically compared the methods' predicted disparity estimates, the variance of the estimates, and the sensitivity of the estimates to limitations of available data. The rank and replace and combined methods (but not the propensity score method) are concordant with the IOM definition of racial disparities in that each creates a comparison group with the appropriate marginal distributions of health status and SES variables. Predicted disparities and prediction variances were similar for the rank and replace and combined methods, but the rank and replace method was sensitive to limitations on SES information. For all methods, limiting health status information significantly reduced estimates of disparities compared to a more comprehensive dataset. We conclude that the two IOM-concordant methods were similar enough that either could be considered in disparity predictions. In datasets with limited SES information, the combined method is the better choice. PMID:20352070
Adjusting for Health Status in Non-Linear Models of Health Care Disparities.
Cook, Benjamin L; McGuire, Thomas G; Meara, Ellen; Zaslavsky, Alan M
2009-03-01
This article compared conceptual and empirical strengths of alternative methods for estimating racial disparities using non-linear models of health care access. Three methods were presented (propensity score, rank and replace, and a combined method) that adjust for health status while allowing SES variables to mediate the relationship between race and access to care. Applying these methods to a nationally representative sample of blacks and non-Hispanic whites surveyed in the 2003 and 2004 Medical Expenditure Panel Surveys (MEPS), we assessed the concordance of each of these methods with the Institute of Medicine (IOM) definition of racial disparities, and empirically compared the methods' predicted disparity estimates, the variance of the estimates, and the sensitivity of the estimates to limitations of available data. The rank and replace and combined methods (but not the propensity score method) are concordant with the IOM definition of racial disparities in that each creates a comparison group with the appropriate marginal distributions of health status and SES variables. Predicted disparities and prediction variances were similar for the rank and replace and combined methods, but the rank and replace method was sensitive to limitations on SES information. For all methods, limiting health status information significantly reduced estimates of disparities compared to a more comprehensive dataset. We conclude that the two IOM-concordant methods were similar enough that either could be considered in disparity predictions. In datasets with limited SES information, the combined method is the better choice.
Hassan, A. K.
2015-01-01
In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature–conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility. PMID:26664063
Hassan, A K
2015-01-01
In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature-conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility. PMID:26664063
ERIC Educational Resources Information Center
Phillips, Gary W.
The usefulness of path analysis as a means of better understanding various linear models is demonstrated. First, two linear models are presented in matrix form using linear structural relations (LISREL) notation. The two models, regression and factor analysis, are shown to be identical although the research question and data matrix to which these…
NASA Technical Reports Server (NTRS)
Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.
1998-01-01
The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.
Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf
2015-10-01
The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200-400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset.
NASA Astrophysics Data System (ADS)
Joshi, Deepti; St-Hilaire, André; Daigle, Anik; Ouarda, Taha B. M. J.
2013-04-01
SummaryThis study attempts to compare the performance of two statistical downscaling frameworks in downscaling hydrological indices (descriptive statistics) characterizing the low flow regimes of three rivers in Eastern Canada - Moisie, Romaine and Ouelle. The statistical models selected are Relevance Vector Machine (RVM), an implementation of Sparse Bayesian Learning, and the Automated Statistical Downscaling tool (ASD), an implementation of Multiple Linear Regression. Inputs to both frameworks involve climate variables significantly (α = 0.05) correlated with the indices. These variables were processed using Canonical Correlation Analysis and the resulting canonical variates scores were used as input to RVM to estimate the selected low flow indices. In ASD, the significantly correlated climate variables were subjected to backward stepwise predictor selection and the selected predictors were subsequently used to estimate the selected low flow indices using Multiple Linear Regression. With respect to the correlation between climate variables and the selected low flow indices, it was observed that all indices are influenced, primarily, by wind components (Vertical, Zonal and Meridonal) and humidity variables (Specific and Relative Humidity). The downscaling performance of the framework involving RVM was found to be better than ASD in terms of Relative Root Mean Square Error, Relative Mean Absolute Bias and Coefficient of Determination. In all cases, the former resulted in less variability of the performance indices between calibration and validation sets, implying better generalization ability than for the latter.
Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf
2015-01-01
The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200–400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. PMID:25005037
A componential model of human interaction with graphs: 1. Linear regression modeling
NASA Technical Reports Server (NTRS)
Gillan, Douglas J.; Lewis, Robert
1994-01-01
Task analyses served as the basis for developing the Mixed Arithmetic-Perceptual (MA-P) model, which proposes (1) that people interacting with common graphs to answer common questions apply a set of component processes-searching for indicators, encoding the value of indicators, performing arithmetic operations on the values, making spatial comparisons among indicators, and repsonding; and (2) that the type of graph and user's task determine the combination and order of the components applied (i.e., the processing steps). Two experiments investigated the prediction that response time will be linearly related to the number of processing steps according to the MA-P model. Subjects used line graphs, scatter plots, and stacked bar graphs to answer comparison questions and questions requiring arithmetic calculations. A one-parameter version of the model (with equal weights for all components) and a two-parameter version (with different weights for arithmetic and nonarithmetic processes) accounted for 76%-85% of individual subjects' variance in response time and 61%-68% of the variance taken across all subjects. The discussion addresses possible modifications in the MA-P model, alternative models, and design implications from the MA-P model.
Nucleus detection using gradient orientation information and linear least squares regression
NASA Astrophysics Data System (ADS)
Kwak, Jin Tae; Hewitt, Stephen M.; Xu, Sheng; Pinto, Peter A.; Wood, Bradford J.
2015-03-01
Computerized histopathology image analysis enables an objective, efficient, and quantitative assessment of digitized histopathology images. Such analysis often requires an accurate and efficient detection and segmentation of histological structures such as glands, cells and nuclei. The segmentation is used to characterize tissue specimens and to determine the disease status or outcomes. The segmentation of nuclei, in particular, is challenging due to the overlapping or clumped nuclei. Here, we propose a nuclei seed detection method for the individual and overlapping nuclei that utilizes the gradient orientation or direction information. The initial nuclei segmentation is provided by a multiview boosting approach. The angle of the gradient orientation is computed and traced for the nuclear boundaries. Taking the first derivative of the angle of the gradient orientation, high concavity points (junctions) are discovered. False junctions are found and removed by adopting a greedy search scheme with the goodness-of-fit statistic in a linear least squares sense. Then, the junctions determine boundary segments. Partial boundary segments belonging to the same nucleus are identified and combined by examining the overlapping area between them. Using the final set of the boundary segments, we generate the list of seeds in tissue images. The method achieved an overall precision of 0.89 and a recall of 0.88 in comparison to the manual segmentation.
NASA Astrophysics Data System (ADS)
Elliott, J.; de Souza, R. S.; Krone-Martins, A.; Cameron, E.; Ishida, E. E. O.; Hilbe, J.
2015-04-01
Machine learning techniques offer a precious tool box for use within astronomy to solve problems involving so-called big data. They provide a means to make accurate predictions about a particular system without prior knowledge of the underlying physical processes of the data. In this article, and the companion papers of this series, we present the set of Generalized Linear Models (GLMs) as a fast alternative method for tackling general astronomical problems, including the ones related to the machine learning paradigm. To demonstrate the applicability of GLMs to inherently positive and continuous physical observables, we explore their use in estimating the photometric redshifts of galaxies from their multi-wavelength photometry. Using the gamma family with a log link function we predict redshifts from the PHoto-z Accuracy Testing simulated catalogue and a subset of the Sloan Digital Sky Survey from Data Release 10. We obtain fits that result in catastrophic outlier rates as low as ∼1% for simulated and ∼2% for real data. Moreover, we can easily obtain such levels of precision within a matter of seconds on a normal desktop computer and with training sets that contain merely thousands of galaxies. Our software is made publicly available as a user-friendly package developed in Python, R and via an interactive web application. This software allows users to apply a set of GLMs to their own photometric catalogues and generates publication quality plots with minimum effort. By facilitating their ease of use to the astronomical community, this paper series aims to make GLMs widely known and to encourage their implementation in future large-scale projects, such as the Large Synoptic Survey Telescope.
Kuroda, S.; Okugi, T.; Tauchi, T.; Fujisawa, H.; Ichikawa, M.; Iwashita, Y.; Tajima, Y.; Kumada, M.; Spencer, Cherrill M.; /SLAC
2008-01-18
An adjustable permanent magnet quadrupole has been developed for the final focus (FF) in a linear collider. Recent activities include a newly fabricated inner ring to demonstrate the strongest field gradient at a smaller bore diameter of 15mm and a magnetic field measurement system with a new rotating coil. The prospects of the R&D will be discussed.
Jahandideh, Sepideh Jahandideh, Samad; Asadabadi, Ebrahim Barzegari; Askarian, Mehrdad; Movahedi, Mohammad Mehdi; Hosseini, Somayyeh; Jahandideh, Mina
2009-11-15
Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R{sup 2} were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R{sup 2} confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.
NASA Astrophysics Data System (ADS)
dos Santos, T. S.; Mendes, D.; Torres, R. R.
2015-08-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANN) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon, Northeastern Brazil and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model out- put and observed monthly precipitation. We used GCMs experiments for the 20th century (RCP Historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANN significantly outperforms the MLR downscaling of monthly precipitation variability.
NASA Astrophysics Data System (ADS)
Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.
2016-01-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.
Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena
2013-01-01
The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models.
NASA Astrophysics Data System (ADS)
Buermeyer, Jonas; Gundlach, Matthias; Grund, Anna-Lisa; Grimm, Volker; Spizyn, Alexander; Breckow, Joachim
2016-09-01
This work is part of the analysis of the effects of constructional energy-saving measures to radon concentration levels in dwellings performed on behalf of the German Federal Office for Radiation Protection. In parallel to radon measurements for five buildings, both meteorological data outside the buildings and the indoor climate factors were recorded. In order to access effects of inhabited buildings, the amount of carbon dioxide (CO2) was measured. For a statistical linear regression model, the data of one object was chosen as an example. Three dummy variables were extracted from the process of the CO2 concentration to provide information on the usage and ventilation of the room. The analysis revealed a highly autoregressive model for the radon concentration with additional influence by the natural environmental factors. The autoregression implies a strong dependency on a radon source since it reflects a backward dependency in time. At this point of the investigation, it cannot be determined whether the influence by outside factors affects the source of radon or the habitant’s ventilation behavior resulting in variation of the occurring concentration levels. In any case, the regression analysis might provide further information that would help to distinguish these effects. In the next step, the influence factors will be weighted according to their impact on the concentration levels. This might lead to a model that enables the prediction of radon concentration levels based on the measurement of CO2 in combination with environmental parameters, as well as the development of advices for ventilation.
2013-01-01
Background Genome-wide association studies have become very popular in identifying genetic contributions to phenotypes. Millions of SNPs are being tested for their association with diseases and traits using linear or logistic regression models. This conceptually simple strategy encounters the following computational issues: a large number of tests and very large genotype files (many Gigabytes) which cannot be directly loaded into the software memory. One of the solutions applied on a grand scale is cluster computing involving large-scale resources. We show how to speed up the computations using matrix operations in pure R code. Results We improve speed: computation time from 6 hours is reduced to 10-15 minutes. Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them into computer memory becomes an important issue. However, much improvement can be made if the data is structured beforehand in a way allowing for easy access to blocks of SNPs. We propose several solutions based on the R packages ff and ncdf. We adapted the semi-parallel computations for logistic regression. We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures. Conclusions We provide very fast algorithms for GWAS written in pure R code. We also show how to rearrange SNP data for fast access. PMID:23711206
Silva, Ana Elisa Pereira; Freitas, Corina da Costa; Dutra, Luciano Vieira; Molento, Marcelo Beltrão
2016-02-15
Fasciola hepatica is the causative agent of fasciolosis, a disease that triggers a chronic inflammatory process in the liver affecting mainly ruminants and other animals including humans. In Brazil, F. hepatica occurs in larger numbers in the most Southern state of Rio Grande do Sul. The objective of this study was to estimate areas at risk using an eight-year (2002-2010) time series of climatic and environmental variables that best relate to the disease using a linear regression method to municipalities in the state of Rio Grande do Sul. The positivity index of the disease, which is the rate of infected animal per slaughtered animal, was divided into three risk classes: low, medium and high. The accuracy of the known sample classification on the confusion matrix for the low, medium and high rates produced by the estimated model presented values between 39 and 88% depending of the year. The regression analysis showed the importance of the time-based data for the construction of the model, considering the two variables of the previous year of the event (positivity index and maximum temperature). The generated data is important for epidemiological and parasite control studies mainly because F. hepatica is an infection that can last from months to years. PMID:26827853
Technology Transfer Automated Retrieval System (TEKTRAN)
A stochastic/linear program Excel workbook was developed consisting of two worksheets illustrating linear and stochastic program approaches. Both approaches used the Excel Solver add-in. A published linear program problem served as an example for the ingredients, nutrients and costs and as a benchma...
ERIC Educational Resources Information Center
Thatcher, Greg W.; Henson, Robin K.
This study examined research in training and development to determine effect size reporting practices. It focused on the reporting of corrected effect sizes in research articles using multiple regression analyses. When possible, researchers calculated corrected effect sizes and determine if the associated shrinkage could have impacted researcher…
ERIC Educational Resources Information Center
Tipton, Elizabeth; Pustejovsky, James E.
2015-01-01
Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…
Rafiei, Hamid; Khanzadeh, Marziyeh; Mozaffari, Shahla; Bostanifar, Mohammad Hassan; Avval, Zhila Mohajeri; Aalizadeh, Reza; Pourbasheer, Eslam
2016-01-01
Quantitative structure-activity relationship (QSAR) study has been employed for predicting the inhibitory activities of the Hepatitis C virus (HCV) NS5B polymerase inhibitors. A data set consisted of 72 compounds was selected, and then different types of molecular descriptors were calculated. The whole data set was split into a training set (80 % of the dataset) and a test set (20 % of the dataset) using principle component analysis. The stepwise (SW) and the genetic algorithm (GA) techniques were used as variable selection tools. Multiple linear regression method was then used to linearly correlate the selected descriptors with inhibitory activities. Several validation technique including leave-one-out and leave-group-out cross-validation, Y-randomization method were used to evaluate the internal capability of the derived models. The external prediction ability of the derived models was further analyzed using modified r2, concordance correlation coefficient values and Golbraikh and Tropsha acceptable model criteria's. Based on the derived results (GA-MLR), some new insights toward molecular structural requirements for obtaining better inhibitory activity were obtained. PMID:27065774
Rafiei, Hamid; Khanzadeh, Marziyeh; Mozaffari, Shahla; Bostanifar, Mohammad Hassan; Avval, Zhila Mohajeri; Aalizadeh, Reza; Pourbasheer, Eslam
2016-01-01
Quantitative structure-activity relationship (QSAR) study has been employed for predicting the inhibitory activities of the Hepatitis C virus (HCV) NS5B polymerase inhibitors . A data set consisted of 72 compounds was selected, and then different types of molecular descriptors were calculated. The whole data set was split into a training set (80 % of the dataset) and a test set (20 % of the dataset) using principle component analysis. The stepwise (SW) and the genetic algorithm (GA) techniques were used as variable selection tools. Multiple linear regression method was then used to linearly correlate the selected descriptors with inhibitory activities. Several validation technique including leave-one-out and leave-group-out cross-validation, Y-randomization method were used to evaluate the internal capability of the derived models. The external prediction ability of the derived models was further analyzed using modified r(2), concordance correlation coefficient values and Golbraikh and Tropsha acceptable model criteria's. Based on the derived results (GA-MLR), some new insights toward molecular structural requirements for obtaining better inhibitory activity were obtained. PMID:27065774
Cohen, B.L.
1992-12-31
A plot of lung-cancer rates versus radon exposures in 965 US counties, or in all US states, has a strong negative slope, b, in sharp contrast to the strong positive slope predicted by linear/no-threshold theory. The discrepancy between these slopes exceeds 20 standard deviations (SD). Including smoking frequency in the analysis substantially improves fits to a linear relationship but has little effect on the discrepancy in b, because correlations between smoking frequency and radon levels are quite weak. Including 17 socioeconomic variables (SEV) in multiple regression analysis reduces the discrepancy to 15 SD. Data were divided into segments by stratifying on each SEV in turn, and on geography, and on both simultaneously, giving over 300 data sets to be analyzed individually, but negative slopes predominated. The slope is negative whether one considers only the most urban counties or only the most rural; only the richest or only the poorest; only the richest in the South Atlantic region or only the poorest in that region, etc., etc.,; and for all the strata in between. Since this is an ecological study, the well-known problems with ecological studies were investigated and found not to be applicable here. The {open_quotes}ecological fallacy{close_quotes} was shown not to apply in testing a linear/no-threshold theory, and the vulnerability to confounding is greatly reduced when confounding factors are only weakly correlated with radon levels, as is generally the case here. All confounding factors known to correlate with radon and with lung cancer were investigated quantitatively and found to have little effect on the discrepancy.
Kokaly, R.F.; Clark, R.N.
1999-01-01
We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using
NASA Astrophysics Data System (ADS)
Lee, C. Y.; Tippett, M. K.; Sobel, A. H.; Camargo, S. J.
2014-12-01
We are working towards the development of a new statistical-dynamical downscaling system to study the influence of climate on tropical cyclones (TCs). The first step is development of an appropriate model for TC intensity as a function of environmental variables. We approach this issue with a stochastic model consisting of a multiple linear regression model (MLR) for 12-hour intensity forecasts as a deterministic component, and a random error generator as a stochastic component. Similar to the operational Statistical Hurricane Intensity Prediction Scheme (SHIPS), MLR relates the surrounding environment to storm intensity, but with only essential predictors calculated from monthly-mean NCEP reanalysis fields (potential intensity, shear, etc.) and from persistence. The deterministic MLR is developed with data from 1981-1999 and tested with data from 2000-2012 for the Atlantic, Eastern North Pacific, Western North Pacific, Indian Ocean, and Southern Hemisphere basins. While the global MLR's skill is comparable to that of the operational statistical models (e.g., SHIPS), the distribution of the predicted maximum intensity from deterministic results has a systematic low bias compared to observations; the deterministic MLR creates almost no storms with intensities greater than 100 kt. The deterministic MLR can be significantly improved by adding the stochastic component, based on the distribution of random forecasting errors from the deterministic model compared to the training data. This stochastic component may be thought of as representing the component of TC intensification that is not linearly related to the environmental variables. We find that in order for the stochastic model to accurately capture the observed distribution of maximum storm intensities, the stochastic component must be auto-correlated across 12-hour time steps. This presentation also includes a detailed discussion of the distributions of other TC-intensity related quantities, as well as the inter
Martin, L; Mezcua, M; Ferrer, C; Gil Garcia, M D; Malato, O; Fernandez-Alba, A R
2013-01-01
The main objective of this work was to establish a mathematical function that correlates pesticide residue levels in apple juice with the levels of the pesticides applied on the raw fruit, taking into account some of their physicochemical properties such as water solubility, the octanol/water partition coefficient, the organic carbon partition coefficient, vapour pressure and density. A mixture of 12 pesticides was applied to an apple tree; apples were collected after 10 days of application. After harvest, apples were treated with a mixture of three post-harvest pesticides and the fruits were then processed in order to obtain apple juice following a routine industrial process. The pesticide residue levels in the apple samples were analysed using two multi-residue methods based on LC-MS/MS and GC-MS/MS. The concentration of pesticides was determined in samples derived from the different steps of processing. The processing factors (the coefficient between residue level in the processed commodity and the residue level in the commodity to be processed) obtained for the full juicing process were found to vary among the different pesticides studied. In order to investigate the relationships between the levels of pesticide residue found in apple juice samples and their physicochemical properties, principal component analysis (PCA) was performed using two sets of samples (one of them using experimental data obtained in this work and the other including the data taken from the literature). In both cases the correlation was found between processing factors of pesticides in the apple juice and the negative logarithms (base 10) of the water solubility, octanol/water partition coefficient and organic carbon partition coefficient. The linear correlation between these physicochemical properties and the processing factor were established using a multiple linear regression technique.
Zheng, Fang; Zhan, Max; Huang, Xiaoqin; AbdulHameed, Mohamed Diwan M.; Zhan, Chang-Guo
2013-01-01
Butyrylcholinesterase (BChE) has been an important protein used for development of anti-cocaine medication. Through computational design, BChE mutants with ~2000-fold improved catalytic efficiency against cocaine have been discovered in our lab. To study drug-enzyme interaction it is important to build mathematical model to predict molecular inhibitory activity against BChE. This report presents a neural network (NN) QSAR study, compared with multi-linear regression (MLR) and molecular docking, on a set of 93 small molecules that act as inhibitors of BChE by use of the inhibitory activities (pIC50 values) of the molecules as target values. The statistical results for the linear model built from docking generated energy descriptors were: r2 = 0.67, rmsd = 0.87, q2 = 0.65 and loormsd = 0.90; The statistical results for the ligand-based MLR model were: r2 = 0.89, rmsd = 0.51, q2 = 0.85 and loormsd = 0.58; the statistical results for the ligand-based NN model were the best: r2 = 0.95, rmsd = 0.33, q2 = 0.90 and loormsd = 0.48, demonstrating that the NN is powerful in analysis of a set of complicated data. As BChE is also an established drug target to develop new treatment for Alzheimer’s disease (AD). The developped QSAR models provide tools for rationalizing identification of potential BChE inhibitors or selection of compounds for synthesis in the discovery of novel effective inhibitors of BChE in the future. PMID:24290065
NASA Astrophysics Data System (ADS)
Deml, Ann M.; O'Hayre, Ryan; Wolverton, Chris; Stevanović, Vladan
2016-02-01
The availability of quantitatively accurate total energies (Etot) of atoms, molecules, and solids, enabled by the development of density functional theory (DFT), has transformed solid state physics, quantum chemistry, and materials science by allowing direct calculations of measureable quantities, such as enthalpies of formation (Δ Hf ). Still, the ability to compute Etot and Δ Hf values does not, necessarily, provide insights into the physical mechanisms behind their magnitudes or chemical trends. Here, we examine a large set of calculated Etot and Δ Hf values obtained from the DFT+U -based fitted elemental-phase reference energies (FERE) approach [V. Stevanović, S. Lany, X. Zhang, and A. Zunger, Phys. Rev. B 85, 115104 (2012), 10.1103/PhysRevB.85.115104] to probe relationships between the Etot/Δ Hf of metal-nonmetal compounds in their ground-state crystal structures and properties describing the compound compositions and their elemental constituents. From a stepwise linear regression, we develop a linear model for Etot, and consequently Δ Hf , that reproduces calculated FERE values with a mean absolute error of ˜80 meV/atom. The most significant contributions to the model include calculated total energies of the constituent elements in their reference phases (e.g., metallic iron or gas phase O2), atomic ionization energies and electron affinities, Pauling electronegativity differences, and atomic electric polarizabilities. These contributions are discussed in the context of their connection to the underlying physics. We also demonstrate that our Etot/Δ Hf model can be directly extended to predict the Etot and Δ Hf of compounds outside the set used to develop the model.
Rossi, D J; Kress, D D; Tess, M W; Burfening, P J
1992-05-01
Standard linear adjustment of weaning weight to a constant age has been shown to introduce bias in the adjusted weight due to nonlinear growth from birth to weaning of beef calves. Ten years of field records from the five strains of Beefbooster Cattle Alberta Ltd. seed stock herds were used to investigate the use of correction factors to adjust standard 180-d weight (WT180) for this bias. Statistical analyses were performed within strain and followed three steps: 1) the full data set was split into an estimation set (ES) and a validation set (VS), 2) WT180 from the ES was used to develop estimates of correction factors using a model including herd (H), year (YR), age of dam (DA), sex of calf (S), all two and three-way interactions, and any significant linear and quadratic covariates of calf age at weaning deviated from 180 d (DEVCA) and interactions between DEVCA and DA, S or DA x S, and 3) significant DEVCA coefficients were used to correct WT180 from the VS, then WT180 and the corrected weight (WTCOR) from the VS were analyzed with the same model as in Step 2 and significance of DEVCA terms were compared. Two types of data splitting were used. Adjusted R2 was calculated to describe the proportion of total variation of DEVCA terms explained for WT180 from the ES. The DEVCA terms explained .08 to 1.54% of the total variation for the five strains. Linear and quadratic correction factors were both positive and negative. Bias in WT180 from the ES within 180 +/- 35 d of age ranged from 2.8 to 21.7 kg.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:1526901
Alexeeff, Stacey E; Carroll, Raymond J; Coull, Brent
2016-04-01
Spatial modeling of air pollution exposures is widespread in air pollution epidemiology research as a way to improve exposure assessment. However, there are key sources of exposure model uncertainty when air pollution is modeled, including estimation error and model misspecification. We examine the use of predicted air pollution levels in linear health effect models under a measurement error framework. For the prediction of air pollution exposures, we consider a universal Kriging framework, which may include land-use regression terms in the mean function and a spatial covariance structure for the residuals. We derive the bias induced by estimation error and by model misspecification in the exposure model, and we find that a misspecified exposure model can induce asymptotic bias in the effect estimate of air pollution on health. We propose a new spatial simulation extrapolation (SIMEX) procedure, and we demonstrate that the procedure has good performance in correcting this asymptotic bias. We illustrate spatial SIMEX in a study of air pollution and birthweight in Massachusetts.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
NASA Astrophysics Data System (ADS)
Frecon, Jordan; Didier, Gustavo; Pustelnik, Nelly; Abry, Patrice
2016-08-01
Self-similarity is widely considered the reference framework for modeling the scaling properties of real-world data. However, most theoretical studies and their practical use have remained univariate. Operator Fractional Brownian Motion (OfBm) was recently proposed as a multivariate model for self-similarity. Yet it has remained seldom used in applications because of serious issues that appear in the joint estimation of its numerous parameters. While the univariate fractional Brownian motion requires the estimation of two parameters only, its mere bivariate extension already involves 7 parameters which are very different in nature. The present contribution proposes a method for the full identification of bivariate OfBm (i.e., the joint estimation of all parameters) through an original formulation as a non-linear wavelet regression coupled with a custom-made Branch & Bound numerical scheme. The estimation performance (consistency and asymptotic normality) is mathematically established and numerically assessed by means of Monte Carlo experiments. The impact of the parameters defining OfBm on the estimation performance as well as the associated computational costs are also thoroughly investigated.
NASA Astrophysics Data System (ADS)
de Souza, R. S.; Hilbe, J. M.; Buelens, B.; Riggs, J. D.; Cameron, E.; Ishida, E. E. O.; Chies-Santos, A. L.; Killedar, M.
2015-10-01
In this paper, the third in a series illustrating the power of generalized linear models (GLMs) for the astronomical community, we elucidate the potential of the class of GLMs which handles count data. The size of a galaxy's globular cluster (GC) population (NGC) is a prolonged puzzle in the astronomical literature. It falls in the category of count data analysis, yet it is usually modelled as if it were a continuous response variable. We have developed a Bayesian negative binomial regression model to study the connection between NGC and the following galaxy properties: central black hole mass, dynamical bulge mass, bulge velocity dispersion and absolute visual magnitude. The methodology introduced herein naturally accounts for heteroscedasticity, intrinsic scatter, errors in measurements in both axes (either discrete or continuous) and allows modelling the population of GCs on their natural scale as a non-negative integer variable. Prediction intervals of 99 per cent around the trend for expected NGC comfortably envelope the data, notably including the Milky Way, which has hitherto been considered a problematic outlier. Finally, we demonstrate how random intercept models can incorporate information of each particular galaxy morphological type. Bayesian variable selection methodology allows for automatically identifying galaxy types with different productions of GCs, suggesting that on average S0 galaxies have a GC population 35 per cent smaller than other types with similar brightness.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666
Wang, S; Huang, G H; He, L
2012-09-01
Groundwater contamination by dense non-aqueous phase liquids (DNAPLs) has become an issue of great concern in many industrialized countries due to their serious threat to human health. Dissolution and transport of DNAPLs in porous media are complicated, multidimensional and multiphase processes, which pose formidable challenges for investigation of their behaviors and implementation of effective remediation technologies. Numerical simulation models could help gain in-depth insight into complex mechanisms of DNAPLs dissolution and transport processes in the subsurface; however, they were computationally expensive, especially when a large number of runs were required, which was considered as a major obstacle for conducting further analysis. Therefore, proxy models that mimic key characteristics of a full simulation model were desired to save many orders of magnitude of computational cost. In this study, a clusterwise-linear-regression (CLR)-based forecasting system was developed for establishing a statistical relationship between DNAPL dissolution behaviors and system conditions under discrete and nonlinear complexities. The results indicated that the developed CLR-based forecasting system was capable not only of predicting DNAPL concentrations with acceptable error levels, but also of providing a significance level in each cutting/merging step such that the accuracies of the developed forecasting trees could be controlled. This study was a first attempt to apply the CLR model to characterize DNAPL dissolution and transport processes. PMID:22789814
Wang, S; Huang, G H; He, L
2012-09-01
Groundwater contamination by dense non-aqueous phase liquids (DNAPLs) has become an issue of great concern in many industrialized countries due to their serious threat to human health. Dissolution and transport of DNAPLs in porous media are complicated, multidimensional and multiphase processes, which pose formidable challenges for investigation of their behaviors and implementation of effective remediation technologies. Numerical simulation models could help gain in-depth insight into complex mechanisms of DNAPLs dissolution and transport processes in the subsurface; however, they were computationally expensive, especially when a large number of runs were required, which was considered as a major obstacle for conducting further analysis. Therefore, proxy models that mimic key characteristics of a full simulation model were desired to save many orders of magnitude of computational cost. In this study, a clusterwise-linear-regression (CLR)-based forecasting system was developed for establishing a statistical relationship between DNAPL dissolution behaviors and system conditions under discrete and nonlinear complexities. The results indicated that the developed CLR-based forecasting system was capable not only of predicting DNAPL concentrations with acceptable error levels, but also of providing a significance level in each cutting/merging step such that the accuracies of the developed forecasting trees could be controlled. This study was a first attempt to apply the CLR model to characterize DNAPL dissolution and transport processes.
Kjelstrom, L.C.
1995-01-01
Previously developed U.S. Geological Survey regional regression models of runoff and 11 chemical constituents were evaluated to assess their suitability for use in urban areas in Boise and Garden City. Data collected in the study area were used to develop adjusted regional models of storm-runoff volumes and mean concentrations and loads of chemical oxygen demand, dissolved and suspended solids, total nitrogen and total ammonia plus organic nitrogen as nitrogen, total and dissolved phosphorus, and total recoverable cadmium, copper, lead, and zinc. Explanatory variables used in these models were drainage area, impervious area, land-use information, and precipitation data. Mean annual runoff volume and loads at the five outfalls were estimated from 904 individual storms during 1976 through 1993. Two methods were used to compute individual storm loads. The first method used adjusted regional models of storm loads and the second used adjusted regional models for mean concentration and runoff volume. For large storms, the first method seemed to produce excessively high loads for some constituents and the second method provided more reliable results for all constituents except suspended solids. The first method provided more reliable results for large storms for suspended solids.
Jian, Shih-Jie; Kou, Chwung-Shan; Hwang, Jennchang; Lee, Chein-Dhau; Lin, Wei-Cheng
2013-06-15
A method for controlling the pretilt angles of liquid crystals (LC) was developed. Hexamethyldisiloxane polymer films were first deposited on indium tin oxide coated glass plates using a linear atmospheric pressure plasma source. The films were subsequently treated with the rubbing method for LC alignment. Fourier transform infrared spectroscopy and X-ray photoelectron spectroscopy measurements were used to characterize the film composition, which could be varied to control the surface energy by adjusting the monomer feed rate and input power. The results of LC alignment experiments showed that the pretilt angle continuously increased from 0 Degree-Sign to 90 Degree-Sign with decreasing film surface energy.
Langer, S F
2000-02-01
Left ventricular isovolumic pressure fall is characterized by the time constant tau obtained by fitting the exponential p(t) = p(infinity) + (p(0)-p(infinity))3exp(-t/tau) to pressure fall. It has been shown that tau, calculated from the first half of pressure fall, differs considerably from that found at late relaxation in normal and pathophysiological conditions. The present study aims at testing for such differences statistically and to quantify tau changes during relaxation. Two improvements of the common regression procedure are introduced for that purpose: the use of the four-parametric regression function, p(t) = p(infinity) + (p(0)-p(infinity))3exp[-t/(tau(0)+b(tau)t)], and an optimal data-dependent split of the isovolumic pressure fall interval. The residual regression errors of the methods are statistically compared in one-hundred isolated working rat and one-hundred guinea pig hearts, additionally including a logistic regression method. Regression error is significantly reduced by introducing that b(tau). b(tau) is negative in most cases, indicating accelerated relaxation during isovolumic pressure fall, but zero and positive b(tau) are occasionally seen. Optimal interval tripartition further improves the regression error in most cases. The statistically proved acceleration of the time constant during isovolumic relaxation justifies factor b(t) as a direct and continuous measure of differences between early and late relaxation. This difference between early and late isovolumic relaxation is probably caused by residually contracted myocardium at the beginning of pressure fall, and is therefore important to describe pathophysiological effects on relaxation phases.
NASA Astrophysics Data System (ADS)
KIM, J.
2015-12-01
Aerosol has played an important role in air quality for short term and climate change for long term. Especially, it is important to understand how aerosol optical depth (AOD) has changed to date for the prognosis of future atmospheric state and radiation budget which are related to human life. In this study, the trend of AOD at 550 nm from MODIS Aqua (MYD08) was estimated for 10 years from 2004 to 2014 using linear regression method and ensemble empirical mode decomposition method (EEMD). Search region was selected to East Asia [18.5°N-51.5°N, 85.5°E-150.5°E] which is considered to be of great interest in emission source. The result of linear regression shows remarkably increasing trend in North and East China including Sanjiang, Hailun, Beijing, Beijing forest and Jinozhou Bay, than rather downward trend in other neighboring regions. Actually, however, AOD has seasonality itself and its trend is also affected by external source consistently, so non-linear trend analysis was conducted to analyze the changing tendency of AOD trends. Consequently, secular trends of AOD defined by EEMD showed almost similar values over the entire region, but their shapes over time are quite different with those of linear regression. Here, AOD linear trend in Beijing has monotonically increased [0.03% yr-1] since 2004, but its non-linear trend shows that initial increasing trend has alleviated and even turned into downward trend from about 2010. Lastly, the validation of MODIS AOD with AErosol RObotic NETwork (AERONET) was conducted additionally which showed fairly good agreement with those of AERONET (R=0.901, RMSE=0.226, MAE=0.031, MBE=-0.001).
NASA Astrophysics Data System (ADS)
Saltogianni, Vasso; Stiros, Stathis
2012-11-01
The adjustment of systems of highly non-linear, redundant equations, deriving from observations of certain geophysical processes and geodetic data cannot be based on conventional least-squares techniques, and is based on various numerical inversion techniques. Still these techniques lead to solutions trapped in local minima, to correlated estimates and to solution with poor error control. To overcome these problems, we propose an alternative numerical-topological approach inspired by lighthouse beacon navigation, usually used in 2-D, low-accuracy applications. In our approach, an m-dimensional grid G of points around the real solution (an m-dimensional vector) is at first specified. Then, for each equation an uncertainty is assigned to the corresponding measurement, and the sets of the grid points which satisfy the condition are detected. This process is repeated for all equations, and the common section A of the sets of grid points is defined. From this set of grid points, which define a space including the real solution, we compute its center of weight, which corresponds to an estimate of the solution, and its variance-covariance matrix. An optimal solution can be obtained through optimization of the uncertainty in each observation. The efficiency of the overall process was assessed in comparison with conventional least squares adjustment.
Chang, Kang-Ming; Lo, Pei-Chen
2005-06-01
A lever-like EEG feature-extraction method based on the Hurst exponent and regression-fitting errors is proposed for identifying beta rhythms. The proposed method is superior to most methods using the time- and frequency-domain feature extraction parameters for identifying beta rhythms.
NASA Astrophysics Data System (ADS)
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
NASA Astrophysics Data System (ADS)
Shams Amiri, Shideh
Modeling of energy consumption in buildings is essential for different applications such as building energy management and establishing baselines. This makes building energy consumption estimation as a key tool to achieve the goals on energy consumption and emissions reduction. Energy performance of building is complex, since it depends on several parameters related to the building characteristics, equipment and systems, weather, occupants, and sociological influences. This paper presents a new model to predict and quantify energy consumption in commercial buildings in the early stages of the design. eQUEST and DOE-2 building simulation software was used to build and simulate individual building configuration that were generated using Monte Carlo simulation technique. Ten thousands simulations for seven building shapes were performed to create a comprehensive dataset covering the full ranges of design parameters. The present study considered building materials, their thickness, building shape, and occupant schedule as design variables since building energy performance is sensitive to these variables. Then, the results of the energy simulations were implemented into a set of regression equation to predict the energy consumption in each design scenario. The difference between regression-predicted and DOE-simulated annual building energy consumption are largely within 5%. It is envisioned that the developed regression models can be utilized to estimate the energy savings in the early stages of the design when different building schemes and design concepts are being considered. Keywords: eQUEST simulation, DOE-2 simulation, Monte Carlo simulation, Regression equations, Building energy performance
NASA Astrophysics Data System (ADS)
Zhang, Xiaoyu; Li, Qingbo; Zhang, Guangjun
2013-11-01
In this paper, a modified single-index signal regression (mSISR) method is proposed to construct a nonlinear and practical model with high-accuracy. The mSISR method defines the optimal penalty tuning parameter in P-spline signal regression (PSR) as initial tuning parameter and chooses the number of cycles based on minimizing root mean squared error of cross-validation (RMSECV). mSISR is superior to single-index signal regression (SISR) in terms of accuracy, computation time and convergency. And it can provide the character of the non-linearity between spectra and responses in a more precise manner than SISR. Two spectra data sets from basic research experiments, including plant chlorophyll nondestructive measurement and human blood glucose noninvasive measurement, are employed to illustrate the advantages of mSISR. The results indicate that the mSISR method (i) obtains the smooth and helpful regression coefficient vector, (ii) explicitly exhibits the type and amount of the non-linearity, (iii) can take advantage of nonlinear features of the signals to improve prediction performance and (iv) has distinct adaptability for the complex spectra model by comparing with other calibration methods. It is validated that mSISR is a promising nonlinear modeling strategy for multivariate calibration.
Robertson, D.M.; Saad, D.A.; Heisey, D.M.
2006-01-01
Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.
Fragkaki, A G; Farmaki, E; Thomaidis, N; Tsantili-Kakoulidou, A; Angelis, Y S; Koupparis, M; Georgakopoulos, C
2012-09-21
The comparison among different modelling techniques, such as multiple linear regression, partial least squares and artificial neural networks, has been performed in order to construct and evaluate models for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids. The performance of the quantitative structure-retention relationship study, using the multiple linear regression and partial least squares techniques, has been previously conducted. In the present study, artificial neural networks models were constructed and used for the prediction of relative retention times of anabolic androgenic steroids, while their efficiency is compared with that of the models derived from the multiple linear regression and partial least squares techniques. For overall ranking of the models, a novel procedure [Trends Anal. Chem. 29 (2010) 101-109] based on sum of ranking differences was applied, which permits the best model to be selected. The suggested models are considered useful for the estimation of relative retention times of designer steroids for which no analytical data are available.
Sharon Falcone Miller; Bruce G. Miller
2007-12-15
This paper compares the emissions factors for a suite of liquid biofuels (three animal fats, waste restaurant grease, pressed soybean oil, and a biodiesel produced from soybean oil) and four fossil fuels (i.e., natural gas, No. 2 fuel oil, No. 6 fuel oil, and pulverized coal) in Penn State's commercial water-tube boiler to assess their viability as fuels for green heat applications. The data were broken into two subsets, i.e., fossil fuels and biofuels. The regression model for the liquid biofuels (as a subset) did not perform well for all of the gases. In addition, the coefficient in the models showed the EPA method underestimating CO and NOx emissions. No relation could be studied for SO{sub 2} for the liquid biofuels as they contain no sulfur; however, the model showed a good relationship between the two methods for SO{sub 2} in the fossil fuels. AP-42 emissions factors for the fossil fuels were also compared to the mass balance emissions factors and EPA CFR Title 40 emissions factors. Overall, the AP-42 emissions factors for the fossil fuels did not compare well with the mass balance emissions factors or the EPA CFR Title 40 emissions factors. Regression analysis of the AP-42, EPA, and mass balance emissions factors for the fossil fuels showed a significant relationship only for CO{sub 2} and SO{sub 2}. However, the regression models underestimate the SO{sub 2} emissions by 33%. These tests illustrate the importance in performing material balances around boilers to obtain the most accurate emissions levels, especially when dealing with biofuels. The EPA emissions factors were very good at predicting the mass balance emissions factors for the fossil fuels and to a lesser degree the biofuels. While the AP-42 emissions factors and EPA CFR Title 40 emissions factors are easier to perform, especially in large, full-scale systems, this study illustrated the shortcomings of estimation techniques. 23 refs., 3 figs., 8 tabs.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
NASA Astrophysics Data System (ADS)
Schreiber, P.; Forbrich, I.; Kutzbach, L.; Hormann, A.; Wolf, U.; Miglovec, M.; Pihlatie, M.; Christiansen, J. R.; Wilmking, M.
2009-04-01
Closed chambers are the most common method to determine methane (CH4) fluxes in peatlands. The concentration change over time is monitored, and the flux is usually calculated by the slope of a linear regression function. However, chambers tend to slow down the gas diffusion by changing the concentration gradient between soil and atmosphere. Theoretically, this would result in a near-exponential concentration change in the chamber headspace. Here, we present data from a laboratory experiment and from two field campaigns on the basis of which we evaluate flux calculation approaches based either on linear or exponential regression models. To compare the fit performances of the two models, we used the Akaike Information Criterion with small sample second order bias correction (AICc). For checking the quality of flux data, we used the standard deviation of residuals. The calibration system in the laboratory experiment used during the chamber calibration campaign at Hyytiälä Forestry Field Station in August 2008 has been described by Pumpanen et al. (2004). Five different flux levels on two different soil porosities where tested. Preliminary results show that most concentration-over-time datasets were best described by the exponential model as evaluated by the AICc. It appeared that the flux calculation using the exponential model was better suited to determine the preset fluxes than that using the linear model. In the dataset of the first field campaign (April to October 2007) from Salmisuo (Finland, 62.46Ë N, 30.58Ë E), however, the majority of fluxes was best fitted with a linear regression on all microsite types. Those fluxes which are best fitted exponentially are most probable due to chamber artefacts. They occurred mostly during a drought period in August 2007, which seemed to increase the artificial impact of the chamber. However, these results might be site-specific: In Ust-Pojeg (Russia, 61.56Ë N, 50.13Ë E), where CH4 emissions are supposed to be
Babapour, R; Naghdi, R; Ghajar, I; Ghodsi, R
2015-07-01
Rock proportion of subsoil directly influences the cost of embankment in forest road construction. Therefore, developing a reliable framework for rock ratio estimation prior to the road planning could lead to more light excavation and less cost operations. Prediction of rock proportion was subjected to statistical analyses using the application of Artificial Neural Network (ANN) in MATLAB and five link functions of ordinal logistic regression (OLR) according to the rock type and terrain slope properties. In addition to bed rock and slope maps, more than 100 sample data of rock proportion were collected, observed by geologists, from any available bed rock of every slope class. Four predictive models were developed for rock proportion, employing independent variables and applying both the selected probit link function of OLR and Layer Recurrent and Feed forward back propagation networks of Neural Networks. In ANN, different numbers of neurons are considered for the hidden layer(s). Goodness of the fit measures distinguished that ANN models produced better results than OLR with R (2) = 0.72 and Root Mean Square Error = 0.42. Furthermore, in order to show the applicability of the proposed approach, and to illustrate the variability of rock proportion resulted from the model application, the optimum models were applied to a mountainous forest in where forest road network had been constructed in the past.
Leite de Vasconcellos, M T; Portela, M C
2001-01-01
This paper focuses on the relationship between body mass index (BMI) and family energy intake, occupational energy expenditure, per capita family expenditure, sex, age, and left arm circumference for a group of Brazilian adults randomly selected among those interviewed for a survey on food consumption and family budgets, called the National Family Expenditure Survey. The authors discuss linear regression methodological issues related to treatment of outliers and influential cases, multicollinearity, model specification, heteroscedasticity, as well as the use of two-level variables derived from samples with complex design. The results indicate that the model is not affected by outliers and that there are no significant specification errors. They also show a significant linear relationship between BMI and the variables listed above. Although the hypothesis tests indicate significant heteroscedasticity, its corrections did not significantly change the model's parameters, probably due to the sample size (14,000 adults), making hypothesis tests more rigorous than desired. PMID:11784903
Azadi, Sama; Karimi-Jashni, Ayoub
2016-02-01
Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate. PMID:26482809
Azadi, Sama; Karimi-Jashni, Ayoub
2016-02-01
Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate.
Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar
2016-01-01
Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively. PMID:26414425
Yano, Kentaro; Mita, Suzune; Morimoto, Kaori; Haraguchi, Tamami; Arakawa, Hiroshi; Yoshida, Miyako; Yamashita, Fumiyoshi; Uchida, Takahiro; Ogihara, Takuo
2015-09-01
P-glycoprotein (P-gp) regulates absorption of many drugs in the gastrointestinal tract and their accumulation in tumor tissues, but the basis of substrate recognition by P-gp remains unclear. Bitter-tasting phenylthiocarbamide, which stimulates taste receptor 2 member 38 (T2R38), increases P-gp activity and is a substrate of P-gp. This led us to hypothesize that bitterness intensity might be a predictor of P-gp-inhibitor/substrate status. Here, we measured the bitterness intensity of a panel of P-gp substrates and nonsubstrates with various taste sensors, and used multiple linear regression analysis to examine the relationship between P-gp-inhibitor/substrate status and various physical properties, including intensity of bitter taste measured with the taste sensor. We calculated the first principal component analysis score (PC1) as the representative value of bitterness, as all taste sensor's outputs shared significant correlation. The P-gp substrates showed remarkably greater mean bitterness intensity than non-P-gp substrates. We found that Km value of P-gp substrates were correlated with molecular weight, log P, and PC1 value, and the coefficient of determination (R(2) ) of the linear regression equation was 0.63. This relationship might be useful as an aid to predict P-gp substrate status at an early stage of drug discovery.
Ibrahim, Joseph G.
2014-01-01
Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR. PMID:25309677
Lambert, Ronald J W; Mytilinaios, Ioannis; Maitland, Luke; Brown, Angus M
2012-08-01
This study describes a method to obtain parameter confidence intervals from the fitting of non-linear functions to experimental data, using the SOLVER and Analysis ToolPaK Add-In of the Microsoft Excel spreadsheet. Previously we have shown that Excel can fit complex multiple functions to biological data, obtaining values equivalent to those returned by more specialized statistical or mathematical software. However, a disadvantage of using the Excel method was the inability to return confidence intervals for the computed parameters or the correlations between them. Using a simple Monte-Carlo procedure within the Excel spreadsheet (without recourse to programming), SOLVER can provide parameter estimates (up to 200 at a time) for multiple 'virtual' data sets, from which the required confidence intervals and correlation coefficients can be obtained. The general utility of the method is exemplified by applying it to the analysis of the growth of Listeria monocytogenes, the growth inhibition of Pseudomonas aeruginosa by chlorhexidine and the further analysis of the electrophysiological data from the compound action potential of the rodent optic nerve.
Parinet, Julien; Julien, Maxime; Nun, Pierrick; Robins, Richard J; Remaud, Gerald; Höhener, Patrick
2015-09-01
We aim at predicting the effect of structure and isotopic substitutions on the equilibrium vapour pressure isotope effect of various organic compounds (alcohols, acids, alkanes, alkenes and aromatics) at intermediate temperatures. We attempt to explore quantitative structure property relationships by using artificial neural networks (ANN); the multi-layer perceptron (MLP) and compare the performances of it with multi-linear regression (MLR). These approaches are based on the relationship between the molecular structure (organic chain, polar functions, type of functions, type of isotope involved) of the organic compounds, and their equilibrium vapour pressure. A data set of 130 equilibrium vapour pressure isotope effects was used: 112 were used in the training set and the remaining 18 were used for the test/validation dataset. Two sets of descriptors were tested, a set with all the descriptors: number of(12)C, (13)C, (16)O, (18)O, (1)H, (2)H, OH functions, OD functions, CO functions, Connolly Solvent Accessible Surface Area (CSA) and temperature and a reduced set of descriptors. The dependent variable (the output) is the natural logarithm of the ratios of vapour pressures (ln R), expressed as light/heavy as in classical literature. Since the database is rather small, the leave-one-out procedure was used to validate both models. Considering higher determination coefficients and lower error values, it is concluded that the multi-layer perceptron provided better results compared to multi-linear regression. The stepwise regression procedure is a useful tool to reduce the number of descriptors. To our knowledge, a Quantitative Structure Property Relationship (QSPR) approach for isotopic studies is novel.
Parinet, Julien; Julien, Maxime; Nun, Pierrick; Robins, Richard J; Remaud, Gerald; Höhener, Patrick
2015-09-01
We aim at predicting the effect of structure and isotopic substitutions on the equilibrium vapour pressure isotope effect of various organic compounds (alcohols, acids, alkanes, alkenes and aromatics) at intermediate temperatures. We attempt to explore quantitative structure property relationships by using artificial neural networks (ANN); the multi-layer perceptron (MLP) and compare the performances of it with multi-linear regression (MLR). These approaches are based on the relationship between the molecular structure (organic chain, polar functions, type of functions, type of isotope involved) of the organic compounds, and their equilibrium vapour pressure. A data set of 130 equilibrium vapour pressure isotope effects was used: 112 were used in the training set and the remaining 18 were used for the test/validation dataset. Two sets of descriptors were tested, a set with all the descriptors: number of(12)C, (13)C, (16)O, (18)O, (1)H, (2)H, OH functions, OD functions, CO functions, Connolly Solvent Accessible Surface Area (CSA) and temperature and a reduced set of descriptors. The dependent variable (the output) is the natural logarithm of the ratios of vapour pressures (ln R), expressed as light/heavy as in classical literature. Since the database is rather small, the leave-one-out procedure was used to validate both models. Considering higher determination coefficients and lower error values, it is concluded that the multi-layer perceptron provided better results compared to multi-linear regression. The stepwise regression procedure is a useful tool to reduce the number of descriptors. To our knowledge, a Quantitative Structure Property Relationship (QSPR) approach for isotopic studies is novel. PMID:25559176
Pollock, D.; Kim, K.; Gunst, R.; Schucany, W.
1993-05-01
Linear estimation of cold magnetic field quality based on warm multipole measurements is being considered as a quality control method for SSC production magnet acceptance. To investigate prediction uncertainties associated with such an approach, axial-scan (Z-scan) magnetic measurements from SSC Prototype Collider Dipole Magnets (CDM`s) have been studied. This paper presents a preliminary evaluation of the explanatory ability of warm measurement multipole variation on the prediction of cold magnet multipoles. Two linear estimation methods are presented: least-squares regression, which uses the assumption of fixed independent variable (xi) observations, and the measurement error model, which includes measurement error in the xi`s. The influence of warm multipole measurement errors on predicted cold magnet multipole averages is considered. MSD QA is studying warm/cold correlation to answer several magnet quality control questions. How well do warm measurements predict cold (2kA) multipoles? Does sampling error significantly influence estimates of the linear coefficients (slope, intercept and residual standard error)? Is estimation error for the predicted cold magnet average small compared to typical variation along the Z-Axis? What fraction of the multipole RMS tolerance is accounted for by individual magnet prediction uncertainty?
Lunøe, Kristoffer; Martínez-Sierra, Justo Giner; Gammelgaard, Bente; Alonso, J Ignacio García
2012-03-01
The analytical methodology for the in vivo study of selenium metabolism using two enriched selenium isotopes has been modified, allowing for the internal correction of spectral interferences and mass bias both for total selenium and speciation analysis. The method is based on the combination of an already described dual-isotope procedure with a new data treatment strategy based on multiple linear regression. A metabolic enriched isotope ((77)Se) is given orally to the test subject and a second isotope ((74)Se) is employed for quantification. In our approach, all possible polyatomic interferences occurring in the measurement of the isotope composition of selenium by collision cell quadrupole ICP-MS are taken into account and their relative contribution calculated by multiple linear regression after minimisation of the residuals. As a result, all spectral interferences and mass bias are corrected internally allowing the fast and independent quantification of natural abundance selenium ((nat)Se) and enriched (77)Se. In this sense, the calculation of the tracer/tracee ratio in each sample is straightforward. The method has been applied to study the time-related tissue incorporation of (77)Se in male Wistar rats while maintaining the (nat)Se steady-state conditions. Additionally, metabolically relevant information such as selenoprotein synthesis and selenium elimination in urine could be studied using the proposed methodology. In this case, serum proteins were separated by affinity chromatography while reverse phase was employed for urine metabolites. In both cases, (74)Se was used as a post-column isotope dilution spike. The application of multiple linear regression to the whole chromatogram allowed us to calculate the contribution of bromine hydride, selenium hydride, argon polyatomics and mass bias on the observed selenium isotope patterns. By minimising the square sum of residuals for the whole chromatogram, internal correction of spectral interferences and mass
NASA Astrophysics Data System (ADS)
Ramoelo, A.; Skidmore, A. K.; Cho, M. A.; Mathieu, R.; Heitkönig, I. M. A.; Dudeni-Tlhone, N.; Schlerf, M.; Prins, H. H. T.
2013-08-01
Grass nitrogen (N) and phosphorus (P) concentrations are direct indicators of rangeland quality and provide imperative information for sound management of wildlife and livestock. It is challenging to estimate grass N and P concentrations using remote sensing in the savanna ecosystems. These areas are diverse and heterogeneous in soil and plant moisture, soil nutrients, grazing pressures, and human activities. The objective of the study is to test the performance of non-linear partial least squares regression (PLSR) for predicting grass N and P concentrations through integrating in situ hyperspectral remote sensing and environmental variables (climatic, edaphic and topographic). Data were collected along a land use gradient in the greater Kruger National Park region. The data consisted of: (i) in situ-measured hyperspectral spectra, (ii) environmental variables and measured grass N and P concentrations. The hyperspectral variables included published starch, N and protein spectral absorption features, red edge position, narrow-band indices such as simple ratio (SR) and normalized difference vegetation index (NDVI). The results of the non-linear PLSR were compared to those of conventional linear PLSR. Using non-linear PLSR, integrating in situ hyperspectral and environmental variables yielded the highest grass N and P estimation accuracy (R2 = 0.81, root mean square error (RMSE) = 0.08, and R2 = 0.80, RMSE = 0.03, respectively) as compared to using remote sensing variables only, and conventional PLSR. The study demonstrates the importance of an integrated modeling approach for estimating grass quality which is a crucial effort towards effective management and planning of protected and communal savanna ecosystems.
NASA Technical Reports Server (NTRS)
Whitlock, C. H., III
1977-01-01
Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.
Cui, Haibo; Wei, Xiaomei; Huang, Yu; Hu, Bin; Fang, Yaping; Wang, Jia
2014-01-01
Among human influenza viruses, strain A/H3N2 accounts for over a quarter of a million deaths annually. Antigenic variants of these viruses often render current vaccinations ineffective and lead to repeated infections. In this study, a computational model was developed to predict antigenic variants of the A/H3N2 strain. First, 18 critical antigenic amino acids in the hemagglutinin (HA) protein were recognized using a scoring method combining phi (ϕ) coefficient and information entropy. Next, a prediction model was developed by integrating multiple linear regression method with eight types of physicochemical changes in critical amino acid positions. When compared to other three known models, our prediction model achieved the best performance not only on the training dataset but also on the commonly-used testing dataset composed of 31878 antigenic relationships of the H3N2 influenza virus.
NASA Astrophysics Data System (ADS)
Huttunen, Jani; Kokkola, Harri; Mielonen, Tero; Esa Juhani Mononen, Mika; Lipponen, Antti; Reunanen, Juha; Vilhelm Lindfors, Anders; Mikkonen, Santtu; Erkki Juhani Lehtinen, Kari; Kouremeti, Natalia; Bais, Alkiviadis; Niska, Harri; Arola, Antti
2016-07-01
In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during
Baird, Jim; Curry, Robin; Reid, Tim
2013-03-01
This article describes the development and application of a multiple linear regression model to identify how the key elements of waste and recycling infrastructure, namely container capacity and frequency of collection, affect the yield from municipal kerbside recycling programmes. The overall aim of the research was to gain an understanding of the factors affecting the yield from municipal kerbside recycling programmes in Scotland with an underlying objective to evaluate the efficacy of the model as a decision-support tool for informing the design of kerbside recycling programmes. The study isolates the principal kerbside collection service offered by all 32 councils across Scotland, eliminating those recycling programmes associated with flatted properties or multi-occupancies. The results of the regression analysis model have identified three principal factors which explain 80% of the variability in the average yield of the principal dry recyclate services: weekly residual waste capacity, number of materials collected and the weekly recycling capacity. The use of the model has been evaluated and recommendations made on ongoing methodological development and the use of the results in informing the design of kerbside recycling programmes. We hope that the research can provide insights for the further development of methods to optimise the design and operation of kerbside recycling programmes.
NASA Astrophysics Data System (ADS)
Povak, Nicholas A.; Hessburg, Paul F.; McDonnell, Todd C.; Reynolds, Keith M.; Sullivan, Timothy J.; Salter, R. Brion; Cosby, Bernard J.
2014-04-01
Accurate estimates of soil mineral weathering are required for regional critical load (CL) modeling to identify ecosystems at risk of the deleterious effects from acidification. Within a correlative modeling framework, we used modeled catchment-level base cation weathering (BCw) as the response variable to identify key environmental correlates and predict a continuous map of BCw within the southern Appalachian Mountain region. More than 50 initial candidate predictor variables were submitted to a variety of conventional and machine learning regression models. Predictors included aspects of the underlying geology, soils, geomorphology, climate, topographic context, and acidic deposition rates. Low BCw rates were predicted in catchments with low precipitation, siliceous lithology, low soil clay, nitrogen and organic matter contents, and relatively high levels of canopy cover in mixed deciduous and coniferous forest types. Machine learning approaches, particularly random forest modeling, significantly improved model prediction of catchment-level BCw rates over traditional linear regression, with higher model accuracy and lower error rates. Our results confirmed findings from other studies, but also identified several influential climatic predictor variables, interactions, and nonlinearities among the predictors. Results reported here will be used to support regional sulfur critical loads modeling to identify areas impacted by industrially derived atmospheric S inputs. These methods are readily adapted to other regions where accurate CL estimates are required over broad spatial extents to inform policy and management decisions.
Naguib, Ibrahim A; Abdelaleem, Eglal A; Zaazaa, Hala E; Hussein, Essraa A
2016-07-01
Two multivariate chemometric models, namely, partial least-squares regression (PLSR) and linear support vector regression (SVR), are presented for the analysis of amoxicillin trihydrate and dicloxacillin sodium in the presence of their common impurity (6-aminopenicillanic acid) in raw materials and in pharmaceutical dosage form via handling UV spectral data and making a modest comparison between the two models, highlighting the advantages and limitations of each. For optimum analysis, a three-factor, four-level experimental design was established, resulting in a training set of 16 mixtures containing different ratios of interfering species. To validate the prediction ability of the suggested models, an independent test set consisting of eight mixtures was used. The presented results show the ability of the two proposed models to determine the two drugs simultaneously in the presence of small levels of the common impurity with high accuracy and selectivity. The analysis results of the dosage form were statistically compared to a reported HPLC method, with no significant difference regarding accuracy and precision, indicating the ability of the suggested multivariate calibration models to be reliable and suitable for routine analysis of the drug product. Compared to the PLSR model, the SVR model gives more accurate results with a lower prediction error, as well as high generalization ability; however, the PLSR model is easy to handle and fast to optimize. PMID:27305461
NASA Astrophysics Data System (ADS)
Adamowski, Jan; Fung Chan, Hiu; Prasher, Shiv O.; Ozga-Zielinski, Bogdan; Sliusarieva, Anna
2012-01-01
Daily water demand forecasts are an important component of cost-effective and sustainable management and optimization of urban water supply systems. In this study, a method based on coupling discrete wavelet transforms (WA) and artificial neural networks (ANNs) for urban water demand forecasting applications is proposed and tested. Multiple linear regression (MLR), multiple nonlinear regression (MNLR), autoregressive integrated moving average (ARIMA), ANN and WA-ANN models for urban water demand forecasting at lead times of one day for the summer months (May to August) were developed, and their relative performance was compared using the coefficient of determination, root mean square error, relative root mean square error, and efficiency index. The key variables used to develop and validate the models were daily total precipitation, daily maximum temperature, and daily water demand data from 2001 to 2009 in the city of Montreal, Canada. The WA-ANN models were found to provide more accurate urban water demand forecasts than the MLR, MNLR, ARIMA, and ANN models. The results of this study indicate that coupled wavelet-neural network models are a potentially promising new method of urban water demand forecasting that merit further study.
Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo
2011-03-01
Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. PMID:21334477
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Huang, Dong; Cabral, Ricardo; De la Torre, Fernando
2016-02-01
Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR. PMID:26761740
Leffondré, Karen; Jager, Kitty J; Boucquemont, Julie; Stel, Vianda S; Heinze, Georg
2014-10-01
Regression models are being used to quantify the effect of an exposure on an outcome, while adjusting for potential confounders. While the type of regression model to be used is determined by the nature of the outcome variable, e.g. linear regression has to be applied for continuous outcome variables, all regression models can handle any kind of exposure variables. However, some fundamentals of representation of the exposure in a regression model and also some potential pitfalls have to be kept in mind in order to obtain meaningful interpretation of results. The objective of this educational paper was to illustrate these fundamentals and pitfalls, using various multiple regression models applied to data from a hypothetical cohort of 3000 patients with chronic kidney disease. In particular, we illustrate how to represent different types of exposure variables (binary, categorical with two or more categories and continuous), and how to interpret the regression coefficients in linear, logistic and Cox models. We also discuss the linearity assumption in these models, and show how wrongly assuming linearity may produce biased results and how flexible modelling using spline functions may provide better estimates.
NASA Astrophysics Data System (ADS)
Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.
2016-06-01
Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Fahmy, Ossama T; Ragab, Marwa A A
2010-11-15
This manuscript discusses the application of chemometrics to the handling of HPLC response data using the internal standard method (ISM). This was performed on a model mixture containing terbutaline sulphate, guaiphenesin, bromhexine HCl, sodium benzoate and propylparaben as an internal standard. Derivative treatment of chromatographic response data of analyte and internal standard was followed by convolution of the resulting derivative curves using 8-points sin x(i) polynomials (discrete Fourier functions). The response of each analyte signal, its corresponding derivative and convoluted derivative data were divided by that of the internal standard to obtain the corresponding ratio data. This was found beneficial in eliminating different types of interferences. It was successfully applied to handle some of the most common chromatographic problems and non-ideal conditions, namely: overlapping chromatographic peaks and very low analyte concentrations. For example, a significant change in the correlation coefficient of sodium benzoate, in case of overlapping peaks, went from 0.9975 to 0.9998 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. Also a significant improvement in the precision and accuracy for the determination of synthetic mixtures and dosage forms in non-ideal cases was achieved. For example, in the case of overlapping peaks guaiphenesin mean recovery% and RSD% went from 91.57, 9.83 to 100.04, 0.78 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. This work also compares the application of Theil's method, a non-parametric regression method, in handling the response ratio data, with the least squares parametric regression method, which is considered the de facto standard method used for regression. Theil's method was found to be superior to the method of least squares as it assumes that errors could occur in both x- and y-directions and
Lee, Paul H
2016-01-01
Healthy adults are advised to perform at least 150 min of moderate-intensity physical activity weekly, but this advice is based on studies using self-reports of questionable validity. This study examined the dose-response relationship of accelerometer-measured physical activity and sedentary behaviors on all-cause mortality using segmented Cox regression to empirically determine the break-points of the dose-response relationship. Data from 7006 adult participants aged 18 or above in the National Health and Nutrition Examination Survey waves 2003-2004 and 2005-2006 were included in the analysis and linked with death certificate data using a probabilistic matching approach in the National Death Index through December 31, 2011. Physical activity and sedentary behavior were measured using ActiGraph model 7164 accelerometer over the right hip for 7 consecutive days. Each minute with accelerometer count <100; 1952-5724; and ≥5725 were classified as sedentary, moderate-intensity physical activity, and vigorous-intensity physical activity, respectively. Segmented Cox regression was used to estimate the hazard ratio (HR) of time spent in sedentary behaviors, moderate-intensity physical activity, and vigorous-intensity physical activity and all-cause mortality, adjusted for demographic characteristics, health behaviors, and health conditions. Data were analyzed in 2016. During 47,119 person-year of follow-up, 608 deaths occurred. Each additional hour per day of sedentary behaviors was associated with a HR of 1.15 (95% CI 1.01, 1.31) among participants who spend at least 10.9 h per day on sedentary behaviors, and each additional minute per day spent on moderate-intensity physical activity was associated with a HR of 0.94 (95% CI 0.91, 0.96) among participants with daily moderate-intensity physical activity ≤14.1 min. Associations of moderate physical activity and sedentary behaviors on all-cause mortality were independent of each other. To conclude, evidence from this
Yu, Jianwei; Liu, Juan; An, Wei; Wang, Yongjing; Zhang, Junzhi; Wei, Wei; Su, Ming; Yang, Min
2015-01-01
A total of 86 source water samples from 38 cities across major watersheds of China were collected for a bromide (Br(-)) survey, and the bromate (BrO3 (-)) formation potentials (BFPs) of 41 samples with Br(-) concentration >20 μg L(-1) were evaluated using a batch ozonation reactor. Statistical analyses indicated that higher alkalinity, hardness, and pH of water samples could lead to higher BFPs, with alkalinity as the most important factor. Based on the survey data, a multiple linear regression (MLR) model including three parameters (alkalinity, ozone dose, and total organic carbon (TOC)) was established with a relatively good prediction performance (model selection criterion = 2.01, R (2) = 0.724), using logarithmic transformation of the variables. Furthermore, a contour plot was used to interpret the influence of alkalinity and TOC on BrO3 (-) formation with prediction accuracy as high as 71 %, suggesting that these two parameters, apart from ozone dosage, were the most important ones affecting the BFPs of source waters with Br(-) concentration >20 μg L(-1). The model could be a useful tool for the prediction of the BFPs of source water.
Caselli; Daniele; Mangone; Paolillo
2000-01-15
The apparent pK(a) of dyes in water-in-oil microemulsions depends on the charge of the acid and base forms of the buffers present in the water pool. Extended principal-component analysis allows the precise determination of the apparent pK(a) and of the spectra of the acid and base forms of the dye. Combination with multiple linear regression increases the precision. The pK(a) of 7-hydroxycoumarin (umbelliferone) was spectrophotometrically measured in a water/AOT/isooctane microemulsion in the presence of a series of buffers carrying different charges at various different water/surfactant ratios. The spectra of the acid and base forms of the dye in the microemulsion are very similar to those in bulk water in the presence of Tris and ammonia. The presence of carbonate changes somewhat the spectrum of the acid form. Results are discussed taking into account the profile of the electrostatic potential drop in the water pool and the possible partition of umbelliferone between the aqueous core and the surfactant. The pK(a) values corrected for these effects are independent of w(0) and are close to the value of the pK(a) in bulk water. Copyright 2000 Academic Press.
Caselli, Maurizio; Mangone, Annarosa; Paolillo, Paola; Traini, Angela
2002-01-01
The pKa of 3',3",5',5"tetrabromo-m-cresolsulfonephtalein (Bromocresol Green) and o-cresolsulphonephtalein (Cresol Red) was spectrophotometrically measured in a water/AOT/isooctane microemulsion in the presence of a series of buffers carrying different charges at different water/surfactant ratios. Extended Principal Component Analysis was used for a precise determination of the apparent pKa and of the spectra of the acid and base forms of the dye. The apparent pKa of dyes in water-in-oil microemulsions depends on the charge of the acid and base forms of the buffers present in the water pool. Combination with multiple linear regression increases the precision. Results are discussed taking into account the profile of the electrostatic potential in the water pool and the possible partition of the indicator between the aqueous core and the surfactant. The pKa corrected for these effects are independent of w0 and are close to the value of the pKa in bulk water. On the basis of a tentative hypothesis it is possible to calculate the true pKa of the buffer in the pool.
Kim, Youngho
2015-01-01
Introduction. As blood concentration measurement of commonly abused alcohol is readily available, the equation was proposed in previous publication to predict the change of their concentration. The change of ethylene glycol (EG) concentrations was studied in a case of intoxication to estimate required time for hemodialysis (HD) using linear regression. Case Report. A 55-year-old female with past medical history of seizure disorder, bipolar disorder, and chronic pain was admitted due to severe agitation. The patient was noted to have metabolic acidosis with elevated anion gap and acute kidney injury, which prompted blood concentration measurement of commonly abused alcohol. Her initial EG concentration was 26.45 mmol/L. Fomepizole therapy was initiated, soon followed by HD to enhance clearance. Discussion. Plotting of natural logarithm of EG concentrations over time showed that EG elimination follows first-order kinetics and predicts the change of its concentration well. Pharmacokinetic review revealed minimal elimination of EG by alcohol dehydrogenase (ADH) which could be related to genetic predisposition for ADH activity and home medications as well as presence of propylene glycol. Pharmacokinetics of EG is relatively well studied with published parameters. Consideration and application of pharmacokinetics could assist in management of EG intoxication including HD planning. PMID:26346463
NASA Astrophysics Data System (ADS)
Bernales, A. M.; Antolihao, J. A.; Samonte, C.; Campomanes, F.; Rojas, R. J.; dela Serna, A. M.; Silapan, J.
2016-06-01
The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC) and land surface temperature (LST). Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric "Effective mesh size" was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas) and looking for common predictors between LSTs of these two different farming periods.
McGrane, Scott J; Tetzlaff, Doerthe; Soulsby, Chris
2014-11-01
Faecal coliform (FC) bacteria were used as a proxy of faecal indicator organisms (FIOs) to assess the microbiological pollution risk for eight mesoscale catchments with increasing lowland influence across north-east Scotland. This study sought to assess the impact of urban areas on microbial contaminant fluxes. Fluxes were lowest in upland catchments where populations are relatively low. By contrast, lowland catchments with larger settlements and a greater number of grazing populations have more elevated FC concentrations throughout the year. Peak FC counts occurred during the summer months (April-September) when biological activity is at its highest. Lowland catchments experience high FC concentrations throughout the year whereas upland catchments exhibit more seasonal variations with elevated summer conditions and reduced winter concentrations. A simple linear regression model based on catchment characteristics provided scope to predict FC fluxes. Percentage of improved grazing pasture and human population explained 90 and 62 % of the variation in mean annual FC concentrations. This approach provides scope for an initial screening tool to predict the impact of urban space and agricultural practice on FC concentrations at the catchment scale and can aid in pragmatic planning and water quality improvement decisions. However, greater understanding of the short-term dynamics is still required which would benefit from higher resolution sampling than the approach undertaken here.
Computing measures of explained variation for logistic regression models.
Mittlböck, M; Schemper, M
1999-01-01
The proportion of explained variation (R2) is frequently used in the general linear model but in logistic regression no standard definition of R2 exists. We present a SAS macro which calculates two R2-measures based on Pearson and on deviance residuals for logistic regression. Also, adjusted versions for both measures are given, which should prevent the inflation of R2 in small samples. PMID:10195643
Feng, Danqi; Xie, Heng; Qian, Lifen; Bai, Qinhong; Sun, Junqiang
2015-06-29
We experimentally demonstrate a novel approach for microwave frequency measurement utilizing birefringence effect in the highly non-linear fiber (HNLF). A detailed theoretical analysis is presented to implement the adjustable measurement range and resolution. By stimulating a complementary polarization-domain interferometer pair in the HNLF, a mathematical expression that relates the microwave frequency and amplitude comparison function is developed. We carry out a proof-to-concept experiment. A frequency measurement range of 2.5-30 GHz with a measurement error within 0.5 GHz is achieved except 16-17.5 GHz. This method is all-optical and requires no high-speed electronic components. PMID:26191769
Peluso, Marco E M; Munnia, Armelle; Ceppi, Marcello
2014-11-01
Exposures to bisphenol-A, a weak estrogenic chemical, largely used for the production of plastic containers, can affect the rodent behaviour. Thus, we examined the relationships between bisphenol-A and the anxiety-like behaviour, spatial skills, and aggressiveness, in 12 toxicity studies of rodent offspring from females orally exposed to bisphenol-A, while pregnant and/or lactating, by median and linear splines analyses. Subsequently, the meta-regression analysis was applied to quantify the behavioural changes. U-shaped, inverted U-shaped and J-shaped dose-response curves were found to describe the relationships between bisphenol-A with the behavioural outcomes. The occurrence of anxiogenic-like effects and spatial skill changes displayed U-shaped and inverted U-shaped curves, respectively, providing examples of effects that are observed at low-doses. Conversely, a J-dose-response relationship was observed for aggressiveness. When the proportion of rodents expressing certain traits or the time that they employed to manifest an attitude was analysed, the meta-regression indicated that a borderline significant increment of anxiogenic-like effects was present at low-doses regardless of sexes (β)=-0.8%, 95% C.I. -1.7/0.1, P=0.076, at ≤120 μg bisphenol-A. Whereas, only bisphenol-A-males exhibited a significant inhibition of spatial skills (β)=0.7%, 95% C.I. 0.2/1.2, P=0.004, at ≤100 μg/day. A significant increment of aggressiveness was observed in both the sexes (β)=67.9,C.I. 3.4, 172.5, P=0.038, at >4.0 μg. Then, bisphenol-A treatments significantly abrogated spatial learning and ability in males (P<0.001 vs. females). Overall, our study showed that developmental exposures to low-doses of bisphenol-A, e.g. ≤120 μg/day, were associated to behavioural aberrations in offspring.
NASA Astrophysics Data System (ADS)
Liberman, Neomi; Ben-David Kolikant, Yifat; Beeri, Catriel
2012-09-01
Due to a program reform in Israel, experienced CS high-school teachers faced the need to master and teach a new programming paradigm. This situation served as an opportunity to explore the relationship between teachers' content knowledge (CK) and their pedagogical content knowledge (PCK). This article focuses on three case studies, with emphasis on one of them. Using observations and interviews, we examine how the teachers, we observed taught and what development of their teaching occurred as a result of their teaching experience, if at all. Our findings suggest that this situation creates a new hybrid state of teachers, which we term "regressed experts." These teachers incorporate in their professional practice some elements typical of novices and some typical of experts. We also found that these teachers' experience, although established when teaching a different CK, serve as a leverage to improve their knowledge and understanding of aspects of the new content.
NASA Astrophysics Data System (ADS)
Barbu, N.; Cuculeanu, V.; Stefan, S.
2016-10-01
The aim of this study is to investigate the relationship between the frequency of very warm days (TX90p) in Romania and large-scale atmospheric circulation for winter (December-February) and summer (June-August) between 1962 and 2010. In order to achieve this, two catalogues from COST733Action were used to derive daily circulation types. Seasonal occurrence frequencies of the circulation types were calculated and have been utilized as predictors within the multiple linear regression model (MLRM) for the estimation of winter and summer TX90p values for 85 synoptic stations covering the entire Romania. A forward selection procedure has been utilized to find adequate predictor combinations and those predictor combinations were tested for collinearity. The performance of the MLRMs has been quantified based on the explained variance. Furthermore, the leave-one-out cross-validation procedure was applied and the root-mean-squared error skill score was calculated at station level in order to obtain reliable evidence of MLRM robustness. From this analysis, it can be stated that the MLRM performance is higher in winter compared to summer. This is due to the annual cycle of incoming insolation and to the local factors such as orography and surface albedo variations. The MLRM performances exhibit distinct variations between regions with high performance in wintertime for the eastern and southern part of the country and in summertime for the western part of the country. One can conclude that the MLRM generally captures quite well the TX90p variability and reveals the potential for statistical downscaling of TX90p values based on circulation types.
Herrig, Ilona M; Böer, Simone I; Brennholt, Nicole; Manz, Werner
2015-11-15
Since rivers are typically subject to rapid changes in microbiological water quality, tools are needed to allow timely water quality assessment. A promising approach is the application of predictive models. In our study, we developed multiple linear regression (MLR) models in order to predict the abundance of the fecal indicator organisms Escherichia coli (EC), intestinal enterococci (IE) and somatic coliphages (SC) in the Lahn River, Germany. The models were developed on the basis of an extensive set of environmental parameters collected during a 12-months monitoring period. Two models were developed for each type of indicator: 1) an extended model including the maximum number of variables significantly explaining variations in indicator abundance and 2) a simplified model reduced to the three most influential explanatory variables, thus obtaining a model which is less resource-intensive with regard to required data. Both approaches have the ability to model multiple sites within one river stretch. The three most important predictive variables in the optimized models for the bacterial indicators were NH4-N, turbidity and global solar irradiance, whereas chlorophyll a content, discharge and NH4-N were reliable model variables for somatic coliphages. Depending on indicator type, the extended mode models also included the additional variables rainfall, O2 content, pH and chlorophyll a. The extended mode models could explain 69% (EC), 74% (IE) and 72% (SC) of the observed variance in fecal indicator concentrations. The optimized models explained the observed variance in fecal indicator concentrations to 65% (EC), 70% (IE) and 68% (SC). Site-specific efficiencies ranged up to 82% (EC) and 81% (IE, SC). Our results suggest that MLR models are a promising tool for a timely water quality assessment in the Lahn area. PMID:26318647
Hu, L.; Liang, M.; Mouraux, A.; Wise, R. G.; Hu, Y.
2011-01-01
Across-trial averaging is a widely used approach to enhance the signal-to-noise ratio (SNR) of event-related potentials (ERPs). However, across-trial variability of ERP latency and amplitude may contain physiologically relevant information that is lost by across-trial averaging. Hence, we aimed to develop a novel method that uses 1) wavelet filtering (WF) to enhance the SNR of ERPs and 2) a multiple linear regression with a dispersion term (MLRd) that takes into account shape distortions to estimate the single-trial latency and amplitude of ERP peaks. Using simulated ERP data sets containing different levels of noise, we provide evidence that, compared with other approaches, the proposed WF+MLRd method yields the most accurate estimate of single-trial ERP features. When applied to a real laser-evoked potential data set, the WF+MLRd approach provides reliable estimation of single-trial latency, amplitude, and morphology of ERPs and thereby allows performing meaningful correlations at single-trial level. We obtained three main findings. First, WF significantly enhances the SNR of single-trial ERPs. Second, MLRd effectively captures and measures the variability in the morphology of single-trial ERPs, thus providing an accurate and unbiased estimate of their peak latency and amplitude. Third, intensity of pain perception significantly correlates with the single-trial estimates of N2 and P2 amplitude. These results indicate that WF+MLRd can be used to explore the dynamics between different ERP features, behavioral variables, and other neuroimaging measures of brain activity, thus providing new insights into the functional significance of the different brain processes underlying the brain responses to sensory stimuli. PMID:21880936
Callén, M S; López, J M; Mastral, A M
2010-08-15
The estimation of benzo(a)pyrene (BaP) concentrations in ambient air is very important from an environmental point of view especially with the introduction of the Directive 2004/107/EC and due to the carcinogenic character of this pollutant. A sampling campaign of particulate matter less or equal than 10 microns (PM10) carried out during 2008-2009 in four locations of Spain was collected to determine experimentally BaP concentrations by gas chromatography mass-spectrometry mass-spectrometry (GC-MS-MS). Multivariate linear regression models (MLRM) were used to predict BaP air concentrations in two sampling places, taking PM10 and meteorological variables as possible predictors. The model obtained with data from two sampling sites (all sites model) (R(2)=0.817, PRESS/SSY=0.183) included the significant variables like PM10, temperature, solar radiation and wind speed and was internally and externally validated. The first validation was performed by cross validation and the last one by BaP concentrations from previous campaigns carried out in Zaragoza from 2001-2004. The proposed model constitutes a first approximation to estimate BaP concentrations in urban atmospheres with very good internal prediction (Q(CV)(2)=0.813, PRESS/SSY=0.187) and with the maximal external prediction for the 2001-2002 campaign (Q(ext)(2)=0.679 and PRESS/SSY=0.321) versus the 2001-2004 campaign (Q(ext)(2)=0.551, PRESS/SSY=0.449).
Esbaugh, A J; Brix, K V; Mager, E M; De Schamphelaere, K; Grosell, M
2012-03-01
The current study examined the chronic toxicity of lead (Pb) to three invertebrate species: the cladoceran Ceriodaphnia dubia, the snail Lymnaea stagnalis and the rotifer Philodina rapida. The test media consisted of natural waters from across North America, varying in pertinent water chemistry parameters including dissolved organic carbon (DOC), calcium, pH and total CO(2). Chronic toxicity was assessed using reproductive endpoints for C. dubia and P. rapida while growth was assessed for L. stagnalis, with chronic toxicity varying markedly according to water chemistry. A multi-linear regression (MLR) approach was used to identify the relative importance of individual water chemistry components in predicting chronic Pb toxicity for each species. DOC was an integral component of MLR models for C. dubia and L. stagnalis, but surprisingly had no predictive impact on chronic Pb toxicity for P. rapida. Furthermore, sodium and total CO(2) were also identified as important factors affecting C. dubia toxicity; no other factors were predictive for L. stagnalis. The Pb toxicity of P. rapida was predicted by calcium and pH. The predictive power of the C. dubia and L. stagnalis MLR models was generally similar to that of the current C. dubia BLM, with R(2) values of 0.55 and 0.82 for the respective MLR models, compared to 0.45 and 0.79 for the respective BLMs. In contrast the BLM poorly predicted P. rapida toxicity (R(2)=0.19), as compared to the MLR (R(2)=0.92). The cross species variability in the effects of water chemistry, especially with respect to rotifers, suggests that cross species modeling of invertebrate chronic Pb toxicity using a C. dubia model may not always be appropriate.
Improved Regression Calibration
ERIC Educational Resources Information Center
Skrondal, Anders; Kuha, Jouni
2012-01-01
The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration…
Semiparametric Regression Pursuit.
Huang, Jian; Wei, Fengrong; Ma, Shuangge
2012-10-01
The semiparametric partially linear model allows flexible modeling of covariate effects on the response variable in regression. It combines the flexibility of nonparametric regression and parsimony of linear regression. The most important assumption in the existing methods for the estimation in this model is to assume a priori that it is known which covariates have a linear effect and which do not. However, in applied work, this is rarely known in advance. We consider the problem of estimation in the partially linear models without assuming a priori which covariates have linear effects. We propose a semiparametric regression pursuit method for identifying the covariates with a linear effect. Our proposed method is a penalized regression approach using a group minimax concave penalty. Under suitable conditions we show that the proposed approach is model-pursuit consistent, meaning that it can correctly determine which covariates have a linear effect and which do not with high probability. The performance of the proposed method is evaluated using simulation studies, which support our theoretical results. A real data example is used to illustrated the application of the proposed method. PMID:23559831
Khan, M K I; Naznin, M
2013-10-01
breeds were lowered after fitting the linear regression. The co-efficient of determination (R(2)) of male and female black Bengal and Jamunapari goats kids similar.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A
2013-05-01
This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.
Technology Transfer Automated Retrieval System (TEKTRAN)
Geospatial measurements of ancillary sensor data, such as bulk soil electrical conductivity or remotely sensed imagery data, are commonly used to characterize spatial variation in soil or crop properties. Geostatistical techniques like kriging with external drift or regression kriging are often use...
Xu, Xu; Chang, Chien-Chi; Lu, Ming-Lun
2012-01-01
Previous studies have indicated that cumulative L5/S1 joint load is a potential risk factor for low back pain. The assessment of cumulative L5/S1 joint load during a field study is challenging due to the difficulty of continuously monitoring the dynamic joint load. This study proposes two regression models predicting cumulative dynamic L5/S1 joint moment based on the static L5/S1 joint moment of a lifting task at lift-off and set-down and the lift duration. Twelve men performed lifting tasks at varying lifting ranges and asymmetric angles in a laboratory environment. The cumulative L5/S1 joint moment was calculated from continuous dynamic L5/S1 moments as the reference for comparison. The static L5/S1 joint moments at lift-off and set-down were measured for the two regression models. The prediction error of the cumulative L5/S1 joint moment was 21 ± 14 Nm × s (12% of the measured cumulative L5/S1 joint moment) and 14 ± 9 Nm × s (8%) for the first and the second models, respectively. Practitioner Summary: The proposed regression models may provide a practical approach for predicting the cumulative dynamic L5/S1 joint loading of a lifting task for field studies since it requires only the lifting duration and the static moments at the lift-off and/or set-down instants of the lift.
Wild bootstrap for quantile regression.
Feng, Xingdong; He, Xuming; Hu, Jianhua
2011-12-01
The existing theory of the wild bootstrap has focused on linear estimators. In this note, we broaden its validity by providing a class of weight distributions that is asymptotically valid for quantile regression estimators. As most weight distributions in the literature lead to biased variance estimates for nonlinear estimators of linear regression, we propose a modification of the wild bootstrap that admits a broader class of weight distributions for quantile regression. A simulation study on median regression is carried out to compare various bootstrap methods. With a simple finite-sample correction, the wild bootstrap is shown to account for general forms of heteroscedasticity in a regression model with fixed design points.
NASA Astrophysics Data System (ADS)
Carniti, P.; Cassina, L.; Gotti, C.; Maino, M.; Pessina, G.
2016-07-01
In this work we present ALDO, an adjustable low drop-out linear regulator designed in AMS 0.35 μm CMOS technology. It is specifically tailored for use in the upgraded LHCb RICH detector in order to improve the power supply noise for the front end readout chip (CLARO). ALDO is designed with radiation-tolerant solutions such as an all-MOS band-gap voltage reference and layout techniques aiming to make it able to operate in harsh environments like High Energy Physics accelerators. It is capable of driving up to 200 mA while keeping an adequate power supply filtering capability in a very wide frequency range from 10 Hz up to 100 MHz. This property allows us to suppress the noise and high frequency spikes that could be generated by a DC/DC regulator, for example. ALDO also shows a very low noise of 11.6 μV RMS in the same frequency range. Its output is protected with over-current and short detection circuits for a safe integration in tightly packed environments. Design solutions and measurements of the first prototype are presented.
Tomlinson, Sean
2016-04-01
The calculation and comparison of physiological characteristics of thermoregulation has provided insight into patterns of ecology and evolution for over half a century. Thermoregulation has typically been explored using linear techniques; I explore the application of non-linear scaling to more accurately calculate and compare characteristics and thresholds of thermoregulation, including the basal metabolic rate (BMR), peak metabolic rate (PMR) and the lower (Tlc) and upper (Tuc) critical limits to the thermo-neutral zone (TNZ) for Australian rodents. An exponentially-modified logistic function accurately characterised the response of metabolic rate to ambient temperature, while evaporative water loss was accurately characterised by a Michaelis-Menten function. When these functions were used to resolve unique parameters for the nine species studied here, the estimates of BMR and TNZ were consistent with the previously published estimates. The approach resolved differences in rates of metabolism and water loss between subfamilies of Australian rodents that haven't been quantified before. I suggest that non-linear scaling is not only more effective than the established segmented linear techniques, but also is more objective. This approach may allow broader and more flexible comparison of characteristics of thermoregulation, but it needs testing with a broader array of taxa than those used here.
Meloun, Milan; Bordovská, Sylva; Syrový, Tomás; Vrána, Ales
2006-10-27
Although the modern instrumentation enables for the increased amount of data to be delivered in shorter time, computer-assisted spectra analysis is limited by the intelligence and by the programmed logic tool applications. Proposed tutorial covers all the main steps of the data processing which involve the chemical model building, from calculating the concentration profiles and, using spectra regression, fitting the protonation constants of the chemical model to multiwavelength and multivariate data measured. Suggested diagnostics are examined to see whether the chemical model hypothesis can be accepted, as an incorrect model with false stoichiometric indices may lead to slow convergence, cyclization or divergence of the regression process minimization. Diagnostics concern the physical meaning of unknown parameters beta(qr) and epsilon(qr), physical sense of associated species concentrations, parametric correlation coefficients, goodness-of-fit tests, error analyses and spectra deconvolution, and the correct number of light-absorbing species determination. All of the benefits of spectrophotometric data analysis are demonstrated on the protonation constants of the ionizable anticancer drug 7-ethyl-10-hydroxycamptothecine, using data double checked with the SQUAD(84) and SPECFIT/32 regression programs and with factor analysis of the INDICES program. The experimental determination of protonation constants with their computational prediction based on a knowledge of chemical structures of the drug was through the combined MARVIN and PALLAS programs. If the proposed model adequately represents the data, the residuals should form a random pattern with a normal distribution N(0, s2), with the residual mean equal to zero, and the standard deviation of residuals being near to experimental noise. Examination of residual plots may be assisted by a graphical analysis of residuals, and systematic departures from randomness indicate that the model and parameter estimates are not
Granato, Gregory E.
2006-01-01
The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and
Gerber, Samuel; Rubel, Oliver; Bremer, Peer -Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-19
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.
Koizumi, Hideya; Whitten, William B; Reilly, Pete
2008-12-01
High-resolution real-time particle mass measurements have not been achievable because the enormous amount of kinetic energy imparted to the particles upon expansion into vacuum competes with and overwhelms the forces applied to the charged particles within the mass spectrometer. It is possible to reduce the kinetic energy of a collimated particulate ion beam through collisions with a buffer gas while radially constraining their motion using a quadrupole guide or trap over a limited mass range. Controlling the pressure drop of the final expansion into a quadrupole trap permits a much broader mass range at the cost of sacrificing collimation. To achieve high-resolution mass analysis of massive particulate ions, an efficient trap with a large tolerance for radial divergence of the injected ions was developed that permits trapping a large range of ions for on-demand injection into an awaiting mass analyzer. The design specifications required that frequency of the trapping potential be adjustable to cover a large mass range and the trap radius be increased to increase the tolerance to divergent ion injection. The large-radius linear quadrupole ion trap was demonstrated by trapping singly-charged bovine serum albumin ions for on-demand injection into a mass analyzer. Additionally, this work demonstrates the ability to measure an electrophoretic mobility cross section (or ion mobility) of singly-charged intact proteins in the low-pressure regime. This work represents a large step toward the goal of high-resolution analysis of intact proteins, RNA, DNA, and viruses.
NASA Astrophysics Data System (ADS)
Ghelawi, M. A.; Moore, J. S.; Bisby, R. H.; Dodd, N. J. F.
2001-01-01
Food spoilage is caused by infestation by insects, contamination by bacteria and fungi and by deterioration by enzymes. In the third world, it has been estimated that 25% of agricultural products are lost before they reach the market. One way to decrease such losses is by treatment with ionising radiation and maximum permitted doses have been established for treatment of a wide variety of foods. For dates this dose is 2.0 kGy. Detection of irradiated foods is now essential and here we have used ESR to detect and estimate the dose received by a single date. The ESR spectrum of unirradiated date stone contains a single line g=2.0045 (signal A). Irradiation up to 2.0 kGy induces radical formation with g=1.9895, g=2.0159 (signal C) and g=1.9984 (signal B) high field. The lines with g=1.9895 and 2.0159 are readily detected and stable at room temperature for at least 27 months for samples irradiated up to this dose. The yield of the radicals resulting in these lines increase linearly up to a dose of 5.0 kGy as is evidenced by the linear increase in their intensity. In blind trials of 21 unirradiated and irradiated dates we are able to identify with 100% accuracy an irradiated sample and to estimate the dose to which the sample was irradiated to within ˜0.5 kGy.
ERIC Educational Resources Information Center
Matson, Johnny L.; Kozlowski, Alison M.
2010-01-01
Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…
Gebrehiwot, Tesfay Gebregzabher; San Sebastian, Miguel; Edin, Kerstin; Goicolea, Isabel
2015-01-01
Background In 2003, the Ethiopian Ministry of Health established the Health Extension Program (HEP), with the goal of improving access to health care and health promotion activities in rural areas of the country. This paper aims to assess the association of the HEP with improved utilization of maternal health services in Northern Ethiopia using institution-based retrospective data. Methods Average quarterly total attendances for antenatal care (ANC), delivery care (DC) and post-natal care (PNC) at health posts and health care centres were studied from 2002 to 2012. Regression analysis was applied to two models to assess whether trends were statistically significant. One model was used to estimate the level and trend changes associated with the immediate period of intervention, while changes related to the post-intervention period were estimated by the other. Results The total number of consultations for ANC, DC and PNC increased constantly, particularly after the late-intervention period. Increases were higher for ANC and PNC at health post level and for DC at health centres. A positive statistically significant upward trend was found for DC and PNC in all facilities (p<0.01). The positive trend was also present in ANC at health centres (p = 0.04), but not at health posts. Conclusion Our findings revealed an increase in the use of antenatal, delivery and post-natal care after the introduction of the HEP. We are aware that other factors, that we could not control for, might be explaining that increase. The figures for DC and PNC are however low and more needs to be done in order to increase the access to the health care system as well as the demand for these services by the population. Strengthening of the health information system in the region needs also to be prioritized. PMID:26218074
Regression problems for magnitudes
NASA Astrophysics Data System (ADS)
Castellaro, S.; Mulargia, F.; Kagan, Y. Y.
2006-06-01
Least-squares linear regression is so popular that it is sometimes applied without checking whether its basic requirements are satisfied. In particular, in studying earthquake phenomena, the conditions (a) that the uncertainty on the independent variable is at least one order of magnitude smaller than the one on the dependent variable, (b) that both data and uncertainties are normally distributed and (c) that residuals are constant are at times disregarded. This may easily lead to wrong results. As an alternative to least squares, when the ratio between errors on the independent and the dependent variable can be estimated, orthogonal regression can be applied. We test the performance of orthogonal regression in its general form against Gaussian and non-Gaussian data and error distributions and compare it with standard least-square regression. General orthogonal regression is found to be superior or equal to the standard least squares in all the cases investigated and its use is recommended. We also compare the performance of orthogonal regression versus standard regression when, as often happens in the literature, the ratio between errors on the independent and the dependent variables cannot be estimated and is arbitrarily set to 1. We apply these results to magnitude scale conversion, which is a common problem in seismology, with important implications in seismic hazard evaluation, and analyse it through specific tests. Our analysis concludes that the commonly used standard regression may induce systematic errors in magnitude conversion as high as 0.3-0.4, and, even more importantly, this can introduce apparent catalogue incompleteness, as well as a heavy bias in estimates of the slope of the frequency-magnitude distributions. All this can be avoided by using the general orthogonal regression in magnitude conversions.
Mariani, L; Coradini, D; Biganzoli, E; Boracchi, P; Marubini, E; Pilotti, S; Salvadori, B; Silvestrini, R; Veronesi, U; Zucali, R; Rilke, F
1997-06-01
The purpose of the present study was to assess prognostic factor for metachronous contralateral recurrence of breast cancer (CBC). Two factors were of particular interest, namely estrogen (ER) and progesterone (PgR) receptors assayed with the biochemical method in primary tumor tissue. Information was obtained from a prospective clinical database for 1763 axillary node-negative women who had received curative surgery, mostly of the conservative type, and followed-up for a median of 82 months. The analysis was performed based on both a standard (linear) Cox model and an artificial neural network (ANN) extension of this model proposed by Faraggi and Simon. Furthermore, to assess the prognostic importance of the factors considered, model predictive ability was computed. In agreement with already published studies, the results of our analysis confirmed the prognostic role of age at surgery, histology, and primary tumor site, in that young patients (< or = 45 years) with tumors of lobular histology or located at inner/central mammary quadrants were at greater risk of developing CBC. ER and PgR were also shown to have a prognostic role. Their effect, however, was not simple in relation to the presence of interactions between ER and age, and between PgR and histology. In fact, ER appeared to play a protective role in young patients, whereas the opposite was true in older women. Higher levels of PgR implied a greater hazard of CBC occurrence in infiltrating duct carcinoma or tumors with an associated extensive intraductal component, and a lower hazard in infiltrating lobular carcinoma or other histotypes. In spite of the above findings, the predictive value of both the standard and ANN Cox models was relatively low, thus suggesting an intrinsic limitation of the prognostic variables considered, rather than their suboptimal modeling. Research for better prognostic variables should therefore continue.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Jaworski, N W; Liu, D W; Li, D F; Stein, H H
2016-07-01
An experiment was conducted to determine effects on DE, ME, and NE for growing pigs of adding 15 or 30% wheat bran to a corn-soybean meal diet and to compare values for DE, ME, and NE calculated using the difference procedure with values obtained using linear regression. Eighteen barrows (54.4 ± 4.3 kg initial BW) were individually housed in metabolism crates. The experiment had 3 diets and 6 replicate pigs per diet. The control diet contained corn, soybean meal, and no wheat bran. Two additional diets were formulated by mixing 15 or 30% wheat bran with 85 or 70% of the control diet, respectively. The experimental period lasted 15 d. During the initial 7 d, pigs were adapted to their experimental diets and housed in metabolism crates and fed 573 kcal ME/kg BW per day. On d 8, metabolism crates with the pigs were moved into open-circuit respiration chambers for measurement of O consumption and CO and CH production. The feeding level was the same as in the adaptation period, and feces and urine were collected during this period. On d 13 and 14, pigs were fed 225 kcal ME/kg BW per day, and pigs were then fasted for 24 h to obtain fasting heat production. Results of the experiment indicated that the apparent total tract digestibility of DM, GE, crude fiber, ADF, and NDF linearly decreased ( ≤ 0.05) as wheat bran inclusion increased in the diets. The daily O consumption and CO and CH production by pigs fed increasing concentrations of wheat bran linearly decreased ( ≤ 0.05), resulting in a linear decrease ( ≤ 0.05) in heat production. The DE (3,454, 3,257, and 3,161 kcal/kg for diets containing 0, 15, and 30% wheat bran, respectively for diets containing 0, 15, and 30% wheat bran, respectively), ME (3,400, 3,209, and 3,091 kcal/kg for diets containing 0, 15, and 30% wheat bran, respectively), and NE (1,808, 1,575, and 1,458 kcal/kg for diets containing 0, 15, and 30% wheat bran, respectively) of diets decreased (linear, ≤ 0.05) as wheat bran inclusion increased
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Wild bootstrap for quantile regression.
Feng, Xingdong; He, Xuming; Hu, Jianhua
2011-12-01
The existing theory of the wild bootstrap has focused on linear estimators. In this note, we broaden its validity by providing a class of weight distributions that is asymptotically valid for quantile regression estimators. As most weight distributions in the literature lead to biased variance estimates for nonlinear estimators of linear regression, we propose a modification of the wild bootstrap that admits a broader class of weight distributions for quantile regression. A simulation study on median regression is carried out to compare various bootstrap methods. With a simple finite-sample correction, the wild bootstrap is shown to account for general forms of heteroscedasticity in a regression model with fixed design points. PMID:23049133
Background stratified Poisson regression analysis of cohort data
Langholz, Bryan
2012-01-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as ‘nuisance’ variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this ‘conditional’ regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models. PMID:22193911
Background stratified Poisson regression analysis of cohort data.
Richardson, David B; Langholz, Bryan
2012-03-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models. PMID:22193911
Incremental learning for ν-Support Vector Regression.
Gu, Bin; Sheng, Victor S; Wang, Zhijie; Ho, Derek; Osman, Said; Li, Shuo
2015-07-01
The ν-Support Vector Regression (ν-SVR) is an effective regression learning algorithm, which has the advantage of using a parameter ν on controlling the number of support vectors and adjusting the width of the tube automatically. However, compared to ν-Support Vector Classification (ν-SVC) (Schölkopf et al., 2000), ν-SVR introduces an additional linear term into its objective function. Thus, directly applying the accurate on-line ν-SVC algorithm (AONSVM) to ν-SVR will not generate an effective initial solution. It is the main challenge to design an incremental ν-SVR learning algorithm. To overcome this challenge, we propose a special procedure called initial adjustments in this paper. This procedure adjusts the weights of ν-SVC based on the Karush-Kuhn-Tucker (KKT) conditions to prepare an initial solution for the incremental learning. Combining the initial adjustments with the two steps of AONSVM produces an exact and effective incremental ν-SVR learning algorithm (INSVR). Theoretical analysis has proven the existence of the three key inverse matrices, which are the cornerstones of the three steps of INSVR (including the initial adjustments), respectively. The experiments on benchmark datasets demonstrate that INSVR can avoid the infeasible updating paths as far as possible, and successfully converges to the optimal solution. The results also show that INSVR is faster than batch ν-SVR algorithms with both cold and warm starts.
Teaching Practices and the Promotion of Achievement and Adjustment in First Grade
ERIC Educational Resources Information Center
Perry, Kathryn E.; Donohue, Kathleen M.; Weinstein, Rhona S.
2007-01-01
The effects of teacher practices in promoting student academic achievement, behavioral adjustment, and feelings of competence were investigated in a prospective study of 257 children in 14 first grade classrooms. Using hierarchical linear modeling and regression techniques, observed teaching practices in the fall were explored as predictors of…
[Structural adjustment, cultural adjustment?].
Dujardin, B; Dujardin, M; Hermans, I
2003-12-01
Over the last two decades, multiple studies have been conducted and many articles published about Structural Adjustment Programmes (SAPs). These studies mainly describe the characteristics of SAPs and analyse their economic consequences as well as their effects upon a variety of sectors: health, education, agriculture and environment. However, very few focus on the sociological and cultural effects of SAPs. Following a summary of SAP's content and characteristics, the paper briefly discusses the historical course of SAPs and the different critiques which have been made. The cultural consequences of SAPs are introduced and are described on four different levels: political, community, familial, and individual. These levels are analysed through examples from the literature and individual testimonies from people in the Southern Hemisphere. The paper concludes that SAPs, alongside economic globalisation processes, are responsible for an acute breakdown of social and cultural structures in societies in the South. It should be a priority, not only to better understand the situation and its determining factors, but also to intervene and act with strategies that support and reinvest in the social and cultural sectors, which is vital in order to allow for individuals and communities in the South to strengthen their autonomy and identify.
Ridge Regression Signal Processing
NASA Technical Reports Server (NTRS)
Kuhl, Mark R.
1990-01-01
The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Quality Reporting of Multivariable Regression Models in Observational Studies
Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M.
2016-01-01
Abstract Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE. Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model. The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0–30.3) of the articles and 18.5% (95% CI: 14.8–22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor. A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature. PMID:27196467
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression.
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record
ERIC Educational Resources Information Center
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Foster, Guy M.; Graham, Jennifer L.
2016-04-06
The Kansas River is a primary source of drinking water for about 800,000 people in northeastern Kansas. Source-water supplies are treated by a combination of chemical and physical processes to remove contaminants before distribution. Advanced notification of changing water-quality conditions and cyanobacteria and associated toxin and taste-and-odor compounds provides drinking-water treatment facilities time to develop and implement adequate treatment strategies. The U.S. Geological Survey (USGS), in cooperation with the Kansas Water Office (funded in part through the Kansas State Water Plan Fund), and the City of Lawrence, the City of Topeka, the City of Olathe, and Johnson County Water One, began a study in July 2012 to develop statistical models at two Kansas River sites located upstream from drinking-water intakes. Continuous water-quality monitors have been operated and discrete-water quality samples have been collected on the Kansas River at Wamego (USGS site number 06887500) and De Soto (USGS site number 06892350) since July 2012. Continuous and discrete water-quality data collected during July 2012 through June 2015 were used to develop statistical models for constituents of interest at the Wamego and De Soto sites. Logistic models to continuously estimate the probability of occurrence above selected thresholds were developed for cyanobacteria, microcystin, and geosmin. Linear regression models to continuously estimate constituent concentrations were developed for major ions, dissolved solids, alkalinity, nutrients (nitrogen and phosphorus species), suspended sediment, indicator bacteria (Escherichia coli, fecal coliform, and enterococci), and actinomycetes bacteria. These models will be used to provide real-time estimates of the probability that cyanobacteria and associated compounds exceed thresholds and of the concentrations of other water-quality constituents in the Kansas River. The models documented in this report are useful for characterizing changes
Application and Interpretation of Hierarchical Multiple Regression.
Jeong, Younhee; Jung, Mi Jung
2016-01-01
The authors reported the association between motivation and self-management behavior of individuals with chronic low back pain after adjusting control variables using hierarchical multiple regression (). This article describes details of the hierarchical regression applying the actual data used in the article by , including how to test assumptions, run the statistical tests, and report the results. PMID:27648796
Correlation Weights in Multiple Regression
ERIC Educational Resources Information Center
Waller, Niels G.; Jones, Jeff A.
2010-01-01
A general theory on the use of correlation weights in linear prediction has yet to be proposed. In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models. Using OLS weights as a comparison, we define cases in which the two weighting…
Cactus: An Introduction to Regression
ERIC Educational Resources Information Center
Hyde, Hartley
2008-01-01
When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…
Ridge Regression for Interactive Models.
ERIC Educational Resources Information Center
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…
Na, Hyunjoo; Dancy, Barbara L; Park, Chang
2015-06-01
The study's purpose was to explore whether frequency of cyberbullying victimization, cognitive appraisals, and coping strategies were associated with psychological adjustments among college student cyberbullying victims. A convenience sample of 121 students completed questionnaires. Linear regression analyses found frequency of cyberbullying victimization, cognitive appraisals, and coping strategies respectively explained 30%, 30%, and 27% of the variance in depression, anxiety, and self-esteem. Frequency of cyberbullying victimization and approach and avoidance coping strategies were associated with psychological adjustments, with avoidance coping strategies being associated with all three psychological adjustments. Interventions should focus on teaching cyberbullying victims to not use avoidance coping strategies. PMID:26001714
Adolescent suicide attempts and adult adjustment
Brière, Frédéric N.; Rohde, Paul; Seeley, John R.; Klein, Daniel; Lewinsohn, Peter M.
2014-01-01
Background Adolescent suicide attempts are disproportionally prevalent and frequently of low severity, raising questions regarding their long-term prognostic implications. In this study, we examined whether adolescent attempts were associated with impairments related to suicidality, psychopathology, and psychosocial functioning in adulthood (objective 1) and whether these impairments were better accounted for by concurrent adolescent confounders (objective 2). Method 816 adolescents were assessed using interviews and questionnaires at four time points from adolescence to adulthood. We examined whether lifetime suicide attempts in adolescence (by T2, mean age 17) predicted adult outcomes (by T4, mean age 30) using linear and logistic regressions in unadjusted models (objective 1) and adjusting for sociodemographic background, adolescent psychopathology, and family risk factors (objective 2). Results In unadjusted analyses, adolescent suicide attempts predicted poorer adjustment on all outcomes, except those related to social role status. After adjustment, adolescent attempts remained predictive of axis I and II psychopathology (anxiety disorder, antisocial and borderline personality disorder symptoms), global and social adjustment, risky sex, and psychiatric treatment utilization. However, adolescent attempts no longer predicted most adult outcomes, notably suicide attempts and major depressive disorder. Secondary analyses indicated that associations did not differ by sex and attempt characteristics (intent, lethality, recurrence). Conclusions Adolescent suicide attempters are at high risk of protracted and wide-ranging impairments, regardless of the characteristics of their attempt. Although attempts specifically predict (and possibly influence) several outcomes, results suggest that most impairments reflect the confounding contributions of other individual and family problems or vulnerabilites in adolescent attempters. PMID:25421360
Jiang, Honghua; Kulkarni, Pandurang M; Mallinckrodt, Craig H; Shurzinske, Linda; Molenberghs, Geert; Lipkovich, Ilya
2015-01-01
The benefits of adjusting for baseline covariates are not as straightforward with repeated binary responses as with continuous response variables. Therefore, in this study, we compared different methods for analyzing repeated binary data through simulations when the outcome at the study endpoint is of interest. Methods compared included chi-square, Fisher's exact test, covariate adjusted/unadjusted logistic regression (Adj.logit/Unadj.logit), covariate adjusted/unadjusted generalized estimating equations (Adj.GEE/Unadj.GEE), covariate adjusted/unadjusted generalized linear mixed model (Adj.GLMM/Unadj.GLMM). All these methods preserved the type I error close to the nominal level. Covariate adjusted methods improved power compared with the unadjusted methods because of the increased treatment effect estimates, especially when the correlation between the baseline and outcome was strong, even though there was an apparent increase in standard errors. Results of the Chi-squared test were identical to those for the unadjusted logistic regression. Fisher's exact test was the most conservative test regarding the type I error rate and also with the lowest power. Without missing data, there was no gain in using a repeated measures approach over a simple logistic regression at the final time point. Analysis of results from five phase III diabetes trials of the same compound was consistent with the simulation findings. Therefore, covariate adjusted analysis is recommended for repeated binary data when the study endpoint is of interest. PMID:25866149
NASA Astrophysics Data System (ADS)
Zhang, Ying; Bi, Peng; Hiller, Janet
2008-01-01
This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
Relationship between Multiple Regression and Selected Multivariable Methods.
ERIC Educational Resources Information Center
Schumacker, Randall E.
The relationship of multiple linear regression to various multivariate statistical techniques is discussed. The importance of the standardized partial regression coefficient (beta weight) in multiple linear regression as it is applied in path, factor, LISREL, and discriminant analyses is emphasized. The multivariate methods discussed in this paper…
Kleinman, Ken; Gillman, Matthew W.
2014-01-01
We implemented 6 confounding adjustment methods: 1) covariate-adjusted regression, 2) propensity score (PS) regression, 3) PS stratification, 4) PS matching with two calipers, 5) inverse-probability-weighting, and 6) doubly-robust estimation to examine the associations between the BMI z-score at 3 years and two separate dichotomous exposure measures: exclusive breastfeeding versus formula only (N = 437) and cesarean section versus vaginal delivery (N = 1236). Data were drawn from a prospective pre-birth cohort study, Project Viva. The goal is to demonstrate the necessity and usefulness, and approaches for multiple confounding adjustment methods to analyze observational data. Unadjusted (univariate) and covariate-adjusted linear regression associations of breastfeeding with BMI z-score were −0.33 (95% CI −0.53, −0.13) and −0.24 (−0.46, −0.02), respectively. The other approaches resulted in smaller N (204 to 276) because of poor overlap of covariates, but CIs were of similar width except for inverse-probability-weighting (75% wider) and PS matching with a wider caliper (76% wider). Point estimates ranged widely, however, from −0.01 to −0.38. For cesarean section, because of better covariate overlap, the covariate-adjusted regression estimate (0.20) was remarkably robust to all adjustment methods, and the widths of the 95% CIs differed less than in the breastfeeding example. Choice of covariate adjustment method can matter. Lack of overlap in covariate structure between exposed and unexposed participants in observational studies can lead to erroneous covariate-adjusted estimates and confidence intervals. We recommend inspecting covariate overlap and using multiple confounding adjustment methods. Similar results bring reassurance. Contradictory results suggest issues with either the data or the analytic method. PMID:25171142
Deriving the Regression Equation without Using Calculus
ERIC Educational Resources Information Center
Gordon, Sheldon P.; Gordon, Florence S.
2004-01-01
Probably the one "new" mathematical topic that is most responsible for modernizing courses in college algebra and precalculus over the last few years is the idea of fitting a function to a set of data in the sense of a least squares fit. Whether it be simple linear regression or nonlinear regression, this topic opens the door to applying the…
Dealing with Outliers: Robust, Resistant Regression
ERIC Educational Resources Information Center
Glasser, Leslie
2007-01-01
Least-squares linear regression is the best of statistics and it is the worst of statistics. The reasons for this paradoxical claim, arising from possible inapplicability of the method and the excessive influence of "outliers", are discussed and substitute regression methods based on median selection, which is both robust and resistant, are…
Illustration of Regression towards the Means
ERIC Educational Resources Information Center
Govindaraju, K.; Haslett, S. J.
2008-01-01
This article presents a procedure for generating a sequence of data sets which will yield exactly the same fitted simple linear regression equation y = a + bx. Unless rescaled, the generated data sets will have progressively smaller variability for the two variables, and the associated response and covariate will "regress" towards their…
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Analyzing Historical Count Data: Poisson and Negative Binomial Regression Models.
ERIC Educational Resources Information Center
Beck, E. M.; Tolnay, Stewart E.
1995-01-01
Asserts that traditional approaches to multivariate analysis, including standard linear regression techniques, ignore the special character of count data. Explicates three suitable alternatives to standard regression techniques, a simple Poisson regression, a modified Poisson regression, and a negative binomial model. (MJP)
The Regression Trunk Approach to Discover Treatment Covariate Interaction
ERIC Educational Resources Information Center
Dusseldorp, Elise; Meulman, Jacqueline J.
2004-01-01
The regression trunk approach (RTA) is an integration of regression trees and multiple linear regression analysis. In this paper RTA is used to discover treatment covariate interactions, in the regression of one continuous variable on a treatment variable with "multiple" covariates. The performance of RTA is compared to the classical method of…
Logistic regression: a brief primer.
Stoltzfus, Jill C
2011-10-01
Regression techniques are versatile in their application to medical research because they can measure associations, predict outcomes, and control for confounding variable effects. As one such technique, logistic regression is an efficient and powerful way to analyze the effect of a group of independent variables on a binary outcome by quantifying each independent variable's unique contribution. Using components of linear regression reflected in the logit scale, logistic regression iteratively identifies the strongest linear combination of variables with the greatest probability of detecting the observed outcome. Important considerations when conducting logistic regression include selecting independent variables, ensuring that relevant assumptions are met, and choosing an appropriate model building strategy. For independent variable selection, one should be guided by such factors as accepted theory, previous empirical investigations, clinical considerations, and univariate statistical analyses, with acknowledgement of potential confounding variables that should be accounted for. Basic assumptions that must be met for logistic regression include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers. Additionally, there should be an adequate number of events per independent variable to avoid an overfit model, with commonly recommended minimum "rules of thumb" ranging from 10 to 20 events per covariate. Regarding model building strategies, the three general types are direct/standard, sequential/hierarchical, and stepwise/statistical, with each having a different emphasis and purpose. Before reaching definitive conclusions from the results of any of these methods, one should formally quantify the model's internal validity (i.e., replicability within the same data set) and external validity (i.e., generalizability beyond the current sample). The resulting logistic regression model
Tharrington, Arnold N.
2015-09-09
The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.
Hybrid fuzzy regression with trapezoidal fuzzy data
NASA Astrophysics Data System (ADS)
Razzaghnia, T.; Danesh, S.; Maleki, A.
2011-12-01
In this regard, this research deals with a method for hybrid fuzzy least-squares regression. The extension of symmetric triangular fuzzy coefficients to asymmetric trapezoidal fuzzy coefficients is considered as an effective measure for removing unnecessary fuzziness of the linear fuzzy model. First, trapezoidal fuzzy variable is applied to derive a bivariate regression model. In the following, normal equations are formulated to solve the four parts of hybrid regression coefficients. Also the model is extended to multiple regression analysis. Eventually, method is compared with Y-H.O. chang's model.
Harry, H.H.
1988-03-11
Abstract and method for the adjustment and alignment of shafts in high power devices. A plurality of adjacent rotatable angled cylinders are positioned between a base and the shaft to be aligned which when rotated introduce an axial offset. The apparatus is electrically conductive and constructed of a structurally rigid material. The angled cylinders allow the shaft such as the center conductor in a pulse line machine to be offset in any desired alignment position within the range of the apparatus. 3 figs.
Harry, Herbert H.
1989-01-01
Apparatus and method for the adjustment and alignment of shafts in high power devices. A plurality of adjacent rotatable angled cylinders are positioned between a base and the shaft to be aligned which when rotated introduce an axial offset. The apparatus is electrically conductive and constructed of a structurally rigid material. The angled cylinders allow the shaft such as the center conductor in a pulse line machine to be offset in any desired alignment position within the range of the apparatus.
Ehrsam, Eric; Kallini, Joseph R.; Lebas, Damien; Modiano, Philippe; Cotten, Hervé
2016-01-01
Fully regressive melanoma is a phenomenon in which the primary cutaneous melanoma becomes completely replaced by fibrotic components as a result of host immune response. Although 10 to 35 percent of cases of cutaneous melanomas may partially regress, fully regressive melanoma is very rare; only 47 cases have been reported in the literature to date. AH of the cases of fully regressive melanoma reported in the literature were diagnosed in conjunction with metastasis on a patient. The authors describe a case of fully regressive melanoma without any metastases at the time of its diagnosis. Characteristic findings on dermoscopy, as well as the absence of melanoma on final biopsy, confirmed the diagnosis.
Ehrsam, Eric; Kallini, Joseph R.; Lebas, Damien; Modiano, Philippe; Cotten, Hervé
2016-01-01
Fully regressive melanoma is a phenomenon in which the primary cutaneous melanoma becomes completely replaced by fibrotic components as a result of host immune response. Although 10 to 35 percent of cases of cutaneous melanomas may partially regress, fully regressive melanoma is very rare; only 47 cases have been reported in the literature to date. AH of the cases of fully regressive melanoma reported in the literature were diagnosed in conjunction with metastasis on a patient. The authors describe a case of fully regressive melanoma without any metastases at the time of its diagnosis. Characteristic findings on dermoscopy, as well as the absence of melanoma on final biopsy, confirmed the diagnosis. PMID:27672418
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context.
Least-Squares Data Adjustment with Rank-Deficient Data Covariance Matrices
Williams, J.G.
2011-07-01
A derivation of the linear least-squares adjustment formulae is required that avoids the assumption that the covariance matrix of prior parameters can be inverted. Possible proofs are of several kinds, including: (i) extension of standard results for the linear regression formulae, and (ii) minimization by differentiation of a quadratic form of the deviations in parameters and responses. In this paper, the least-squares adjustment equations are derived in both these ways, while explicitly assuming that the covariance matrix of prior parameters is singular. It will be proved that the solutions are unique and that, contrary to statements that have appeared in the literature, the least-squares adjustment problem is not ill-posed. No modification is required to the adjustment formulae that have been used in the past in the case of a singular covariance matrix for the priors. In conclusion: The linear least-squares adjustment formula that has been used in the past is valid in the case of a singular covariance matrix for the covariance matrix of prior parameters. Furthermore, it provides a unique solution. Statements in the literature, to the effect that the problem is ill-posed are wrong. No regularization of the problem is required. This has been proved in the present paper by two methods, while explicitly assuming that the covariance matrix of prior parameters is singular: i) extension of standard results for the linear regression formulae, and (ii) minimization by differentiation of a quadratic form of the deviations in parameters and responses. No modification is needed to the adjustment formulae that have been used in the past. (author)
Quantile Regression With Measurement Error
Wei, Ying; Carroll, Raymond J.
2010-01-01
Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. PMID:20305802
Quantile regression modeling for Malaysian automobile insurance premium data
NASA Astrophysics Data System (ADS)
Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz
2015-09-01
Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.
Prediction in Multiple Regression.
ERIC Educational Resources Information Center
Osborne, Jason W.
2000-01-01
Presents the concept of prediction via multiple regression (MR) and discusses the assumptions underlying multiple regression analyses. Also discusses shrinkage, cross-validation, and double cross-validation of prediction equations and describes how to calculate confidence intervals around individual predictions. (SLD)
The Geometry of Enhancement in Multiple Regression
ERIC Educational Resources Information Center
Waller, Niels G.
2011-01-01
In linear multiple regression, "enhancement" is said to occur when R[superscript 2] = b[prime]r greater than r[prime]r, where b is a p x 1 vector of standardized regression coefficients and r is a p x 1 vector of correlations between a criterion y and a set of standardized regressors, x. When p = 1 then b [is congruent to] r and enhancement cannot…
Logistic Regression: Going beyond Point-and-Click.
ERIC Educational Resources Information Center
King, Jason E.
A review of the literature reveals that important statistical algorithms and indices pertaining to logistic regression are being underused. This paper describes logistic regression in comparison with discriminant analysis and linear regression, and suggests that some techniques only accessible through computer syntax should be consulted in…
Using Leverage and Influence to Introduce Regression Diagnostics.
ERIC Educational Resources Information Center
Hoaglin, David C.
1988-01-01
Techniques for teaching linear regression are provided. Discussed are leverage and the hat matrix in simple regression, residuals, the notion of leaving out each observation individually, and use of this to study influence on fitted values and to define residuals. Finally, corresponding diagnostics for multiple regression are discussed. (MNS)
Applications of statistics to medical science, III. Correlation and regression.
Watanabe, Hiroshi
2012-01-01
In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Multivariate Regression with Calibration*
Liu, Han; Wang, Lie; Zhao, Tuo
2014-01-01
We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models. Compared to existing methods, CMR calibrates the regularization for each regression task with respect to its noise level so that it is simultaneously tuning insensitive and achieves an improved finite-sample performance. Computationally, we develop an efficient smoothed proximal gradient algorithm which has a worst-case iteration complexity O(1/ε), where ε is a pre-specified numerical accuracy. Theoretically, we prove that CMR achieves the optimal rate of convergence in parameter estimation. We illustrate the usefulness of CMR by thorough numerical simulations and show that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR on a brain activity prediction problem and find that CMR is as competitive as the handcrafted model created by human experts. PMID:25620861
Metamorphic geodesic regression.
Hong, Yi; Joshi, Sarang; Sanchez, Mar; Styner, Martin; Niethammer, Marc
2012-01-01
We propose a metamorphic geodesic regression approach approximating spatial transformations for image time-series while simultaneously accounting for intensity changes. Such changes occur for example in magnetic resonance imaging (MRI) studies of the developing brain due to myelination. To simplify computations we propose an approximate metamorphic geodesic regression formulation that only requires pairwise computations of image metamorphoses. The approximated solution is an appropriately weighted average of initial momenta. To obtain initial momenta reliably, we develop a shooting method for image metamorphosis.
Tarpey, Thaddeus; Petkova, Eva
2010-07-01
Finite mixture models have come to play a very prominent role in modelling data. The finite mixture model is predicated on the assumption that distinct latent groups exist in the population. The finite mixture model therefore is based on a categorical latent variable that distinguishes the different groups. Often in practice distinct sub-populations do not actually exist. For example, disease severity (e.g. depression) may vary continuously and therefore, a distinction of diseased and not-diseased may not be based on the existence of distinct sub-populations. Thus, what is needed is a generalization of the finite mixture's discrete latent predictor to a continuous latent predictor. We cast the finite mixture model as a regression model with a latent Bernoulli predictor. A latent regression model is proposed by replacing the discrete Bernoulli predictor by a continuous latent predictor with a beta distribution. Motivation for the latent regression model arises from applications where distinct latent classes do not exist, but instead individuals vary according to a continuous latent variable. The shapes of the beta density are very flexible and can approximate the discrete Bernoulli distribution. Examples and a simulation are provided to illustrate the latent regression model. In particular, the latent regression model is used to model placebo effect among drug treated subjects in a depression study. PMID:20625443
Streamflow forecasting using functional regression
NASA Astrophysics Data System (ADS)
Masselot, Pierre; Dabo-Niang, Sophie; Chebana, Fateh; Ouarda, Taha B. M. J.
2016-07-01
Streamflow, as a natural phenomenon, is continuous in time and so are the meteorological variables which influence its variability. In practice, it can be of interest to forecast the whole flow curve instead of points (daily or hourly). To this end, this paper introduces the functional linear models and adapts it to hydrological forecasting. More precisely, functional linear models are regression models based on curves instead of single values. They allow to consider the whole process instead of a limited number of time points or features. We apply these models to analyse the flow volume and the whole streamflow curve during a given period by using precipitations curves. The functional model is shown to lead to encouraging results. The potential of functional linear models to detect special features that would have been hard to see otherwise is pointed out. The functional model is also compared to the artificial neural network approach and the advantages and disadvantages of both models are discussed. Finally, future research directions involving the functional model in hydrology are presented.
A New Sample Size Formula for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.; Barcikowski, Robert S.
The focus of this research was to determine the efficacy of a new method of selecting sample sizes for multiple linear regression. A Monte Carlo simulation was used to study both empirical predictive power rates and empirical statistical power rates of the new method and seven other methods: those of C. N. Park and A. L. Dudycha (1974); J. Cohen…
Method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1972-01-01
Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.
Using Regression Analysis: A Guided Tour.
ERIC Educational Resources Information Center
Shelton, Fred Ames
1987-01-01
Discusses the use and interpretation of multiple regression analysis with computer programs and presents a flow chart of the process. A general explanation of the flow chart is provided, followed by an example showing the development of a linear equation which could be used in estimating manufacturing overhead cost. (Author/LRW)
Design Coding and Interpretation in Multiple Regression.
ERIC Educational Resources Information Center
Lunneborg, Clifford E.
The multiple regression or general linear model (GLM) is a parameter estimation and hypothesis testing model which encompasses and approaches the more familiar fixed effects analysis of variance (ANOVA). The transition from ANOVA to GLM is accomplished, roughly, by coding treatment level or group membership to produce a set of predictor or…
Reconstruction of missing daily streamflow data using dynamic regression models
NASA Astrophysics Data System (ADS)
Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault
2015-12-01
River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.
[Understanding logistic regression].
El Sanharawi, M; Naudet, F
2013-10-01
Logistic regression is one of the most common multivariate analysis models utilized in epidemiology. It allows the measurement of the association between the occurrence of an event (qualitative dependent variable) and factors susceptible to influence it (explicative variables). The choice of explicative variables that should be included in the logistic regression model is based on prior knowledge of the disease physiopathology and the statistical association between the variable and the event, as measured by the odds ratio. The main steps for the procedure, the conditions of application, and the essential tools for its interpretation are discussed concisely. We also discuss the importance of the choice of variables that must be included and retained in the regression model in order to avoid the omission of important confounding factors. Finally, by way of illustration, we provide an example from the literature, which should help the reader test his or her knowledge.
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
Explorations in Statistics: Regression
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2011-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This seventh installment of "Explorations in Statistics" explores regression, a technique that estimates the nature of the relationship between two things for which we may only surmise a mechanistic or predictive connection.…
Modern Regression Discontinuity Analysis
ERIC Educational Resources Information Center
Bloom, Howard S.
2012-01-01
This article provides a detailed discussion of the theory and practice of modern regression discontinuity (RD) analysis for estimating the effects of interventions or treatments. Part 1 briefly chronicles the history of RD analysis and summarizes its past applications. Part 2 explains how in theory an RD analysis can identify an average effect of…
Mechanisms of neuroblastoma regression
Brodeur, Garrett M.; Bagatell, Rochelle
2014-01-01
Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Hirozawa, Anne M; Montez-Rath, Maria E; Johnson, Elizabeth C; Solnit, Stephen A; Drennan, Michael J; Katz, Mitchell H; Marx, Rani
2016-01-01
We compared prospective risk adjustment models for adjusting patient panels at the San Francisco Department of Public Health. We used 4 statistical models (linear regression, two-part model, zero-inflated Poisson, and zero-inflated negative binomial) and 4 subsets of predictor variables (age/gender categories, chronic diagnoses, homelessness, and a loss to follow-up indicator) to predict primary care visit frequency. Predicted visit frequency was then used to calculate patient weights and adjusted panel sizes. The two-part model using all predictor variables performed best (R = 0.20). This model, designed specifically for safety net patients, may prove useful for panel adjustment in other public health settings.
Hirozawa, Anne M; Montez-Rath, Maria E; Johnson, Elizabeth C; Solnit, Stephen A; Drennan, Michael J; Katz, Mitchell H; Marx, Rani
2016-01-01
We compared prospective risk adjustment models for adjusting patient panels at the San Francisco Department of Public Health. We used 4 statistical models (linear regression, two-part model, zero-inflated Poisson, and zero-inflated negative binomial) and 4 subsets of predictor variables (age/gender categories, chronic diagnoses, homelessness, and a loss to follow-up indicator) to predict primary care visit frequency. Predicted visit frequency was then used to calculate patient weights and adjusted panel sizes. The two-part model using all predictor variables performed best (R = 0.20). This model, designed specifically for safety net patients, may prove useful for panel adjustment in other public health settings. PMID:27576054
Resistors Improve Ramp Linearity
NASA Technical Reports Server (NTRS)
Kleinberg, L. L.
1982-01-01
Simple modification to bootstrap ramp generator gives more linear output over longer sweep times. New circuit adds just two resistors, one of which is adjustable. Modification cancels nonlinearities due to variations in load on charging capacitor and due to changes in charging current as the voltage across capacitor increases.
Lasso adjustments of treatment effect estimates in randomized experiments
Bloniarz, Adam; Liu, Hanzhong; Zhang, Cun-Hui; Sekhon, Jasjeet S.; Yu, Bin
2016-01-01
We provide a principled way for investigators to analyze randomized experiments when the number of covariates is large. Investigators often use linear multivariate regression to analyze randomized experiments instead of simply reporting the difference of means between treatment and control groups. Their aim is to reduce the variance of the estimated treatment effect by adjusting for covariates. If there are a large number of covariates relative to the number of observations, regression may perform poorly because of overfitting. In such cases, the least absolute shrinkage and selection operator (Lasso) may be helpful. We study the resulting Lasso-based treatment effect estimator under the Neyman–Rubin model of randomized experiments. We present theoretical conditions that guarantee that the estimator is more efficient than the simple difference-of-means estimator, and we provide a conservative estimator of the asymptotic variance, which can yield tighter confidence intervals than the difference-of-means estimator. Simulation and data examples show that Lasso-based adjustment can be advantageous even when the number of covariates is less than the number of observations. Specifically, a variant using Lasso for selection and ordinary least squares (OLS) for estimation performs particularly well, and it chooses a smoothing parameter based on combined performance of Lasso and OLS. PMID:27382153
Lasso adjustments of treatment effect estimates in randomized experiments.
Bloniarz, Adam; Liu, Hanzhong; Zhang, Cun-Hui; Sekhon, Jasjeet S; Yu, Bin
2016-07-01
We provide a principled way for investigators to analyze randomized experiments when the number of covariates is large. Investigators often use linear multivariate regression to analyze randomized experiments instead of simply reporting the difference of means between treatment and control groups. Their aim is to reduce the variance of the estimated treatment effect by adjusting for covariates. If there are a large number of covariates relative to the number of observations, regression may perform poorly because of overfitting. In such cases, the least absolute shrinkage and selection operator (Lasso) may be helpful. We study the resulting Lasso-based treatment effect estimator under the Neyman-Rubin model of randomized experiments. We present theoretical conditions that guarantee that the estimator is more efficient than the simple difference-of-means estimator, and we provide a conservative estimator of the asymptotic variance, which can yield tighter confidence intervals than the difference-of-means estimator. Simulation and data examples show that Lasso-based adjustment can be advantageous even when the number of covariates is less than the number of observations. Specifically, a variant using Lasso for selection and ordinary least squares (OLS) for estimation performs particularly well, and it chooses a smoothing parameter based on combined performance of Lasso and OLS. PMID:27382153
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination.
Multicollinearity in cross-sectional regressions
NASA Astrophysics Data System (ADS)
Lauridsen, Jørgen; Mur, Jesùs
2006-10-01
The paper examines robustness of results from cross-sectional regression paying attention to the impact of multicollinearity. It is well known that the reliability of estimators (least-squares or maximum-likelihood) gets worse as the linear relationships between the regressors become more acute. We resolve the discussion in a spatial context, looking closely into the behaviour shown, under several unfavourable conditions, by the most outstanding misspecification tests when collinear variables are added to the regression. A Monte Carlo simulation is performed. The conclusions point to the fact that these statistics react in different ways to the problems posed.
Linear models: permutation methods
Cade, B.S.; Everitt, B.S.; Howell, D.C.
2005-01-01
Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...
Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach
ERIC Educational Resources Information Center
Mesic, Vanes; Muratovic, Hasnija
2011-01-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…
Poisson Regression Analysis of Illness and Injury Surveillance Data
Frome E.L., Watkins J.P., Ellis E.D.
2012-12-12
The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra
Synthesizing regression results: a factored likelihood method.
Wu, Meng-Jia; Becker, Betsy Jane
2013-06-01
Regression methods are widely used by researchers in many fields, yet methods for synthesizing regression results are scarce. This study proposes using a factored likelihood method, originally developed to handle missing data, to appropriately synthesize regression models involving different predictors. This method uses the correlations reported in the regression studies to calculate synthesized standardized slopes. It uses available correlations to estimate missing ones through a series of regressions, allowing us to synthesize correlations among variables as if each included study contained all the same variables. Great accuracy and stability of this method under fixed-effects models were found through Monte Carlo simulation. An example was provided to demonstrate the steps for calculating the synthesized slopes through sweep operators. By rearranging the predictors in the included regression models or omitting a relatively small number of correlations from those models, we can easily apply the factored likelihood method to many situations involving synthesis of linear models. Limitations and other possible methods for synthesizing more complicated models are discussed. Copyright © 2012 John Wiley & Sons, Ltd. PMID:26053653
Development and validation of the Diagnostic Interview Adjustment Disorder (DIAD).
Cornelius, L R; Brouwer, S; de Boer, M R; Groothoff, J W; van der Klink, J J L
2014-06-01
Adjustment disorders (ADs) are under-researched due to the absence of a reliable and valid diagnostic tool. This paper describes the development and content/construct validation of a fully structured interview for the diagnosis of AD, the Diagnostic Interview Adjustment Disorder (DIAD). We developed the DIAD by partly adjusting and operationalizing DSM-IV criteria. Eleven experts were consulted on the content of the DIAD. In addition, the DIAD was administered by trained lay interviewers to a representative sample of disability claimants (n = 323). To assess construct validity of the DIAD, we explored the associations between the AD classification by the DIAD and summary scores of the Kessler Psychological Distress 10-item Scale (K10) and the World Health Organization Disability Assessment Schedule (WHODAS) by linear regression. Expert agreement on content of the DIAD was moderate to good. The prevalence of AD using the DIAD with revised criteria for the diagnosis AD was 7.4%. The associations of AD by the DIAD with average sum scores on the K10 and the WHODAS supported construct validity of the DIAD. The results provide a first indication that the DIAD is a valid instrument to diagnose AD. Further studies on reliability and on other aspects of validity are needed. PMID:24478059
Carmody, Karen Appleyard; Haskett, Mary E.; Loehman, Jessisca; Rose, Roderick A
2015-01-01
Childhood physical abuse predicts emotional/behavioral, self-regulatory, and social problems. Yet factors from multiple ecological levels contribute to children’s adjustment. The purpose of this study was to examine the degree to which the social-emotional adjustment of physically abused children in first grade would be predicted by a set of child-, parent-, and family-level predictors in kindergarten. Drawing on a short-term longitudinal study of 92 physically abused children and their primary caregivers, the current study used linear regression to examine early childhood child (i.e., gender, IQ, child perceptions of maternal acceptance), parent (i.e., parental mental health), and family relationship (i.e., sensitive parenting, hostile parenting, family conflict) factors as predictors of first grade internalizing and externalizing symptomatology, emotion dysregulation, and negative peer interactions. We used a multi-method, multi-informant approach to measuring predictors and children’s adjustment. Internalizing symptomatology was significantly predicted by child IQ, parental mental health, and family conflict. Externalizing symptomatology and emotion dysregulation were predicted by child IQ. Although a large proportion of variance in measures of adjustment was accounted for by the set of predictors, few individual variables were unique predictors of child adjustment. Variability in the predictors of adjustment for physically abused children underscores the need for individualized treatment approaches. PMID:26401095
Cautionary Remarks on the Use of Clusterwise Regression
ERIC Educational Resources Information Center
Brusco, Michael J.; Cradit, J. Dennis; Steinley, Douglas; Fox, Gavin L.
2008-01-01
Clusterwise linear regression is a multivariate statistical procedure that attempts to cluster objects with the objective of minimizing the sum of the error sums of squares for the within-cluster regression models. In this article, we show that the minimization of this criterion makes no effort to distinguish the error explained by the…
Quantile Regression in the Study of Developmental Sciences
ERIC Educational Resources Information Center
Petscher, Yaacov; Logan, Jessica A. R.
2014-01-01
Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…
Regression Commonality Analysis: A Technique for Quantitative Theory Building
ERIC Educational Resources Information Center
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
Kramer, S.
1996-12-31
In many real-world domains the task of machine learning algorithms is to learn a theory for predicting numerical values. In particular several standard test domains used in Inductive Logic Programming (ILP) are concerned with predicting numerical values from examples and relational and mostly non-determinate background knowledge. However, so far no ILP algorithm except one can predict numbers and cope with nondeterminate background knowledge. (The only exception is a covering algorithm called FORS.) In this paper we present Structural Regression Trees (SRT), a new algorithm which can be applied to the above class of problems. SRT integrates the statistical method of regression trees into ILP. It constructs a tree containing a literal (an atomic formula or its negation) or a conjunction of literals in each node, and assigns a numerical value to each leaf. SRT provides more comprehensible results than purely statistical methods, and can be applied to a class of problems most other ILP systems cannot handle. Experiments in several real-world domains demonstrate that the approach is competitive with existing methods, indicating that the advantages are not at the expense of predictive accuracy.
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Realization of Ridge Regression in MATLAB
NASA Astrophysics Data System (ADS)
Dimitrov, S.; Kovacheva, S.; Prodanova, K.
2008-10-01
The least square estimator (LSE) of the coefficients in the classical linear regression models is unbiased. In the case of multicollinearity of the vectors of design matrix, LSE has very big variance, i.e., the estimator is unstable. A more stable estimator (but biased) can be constructed using ridge-estimator (RE). In this paper the basic methods of obtaining of Ridge-estimators and numerical procedures of its realization in MATLAB are considered. An application to Pharmacokinetics problem is considered.
Assessing risk factors for periodontitis using regression
NASA Astrophysics Data System (ADS)
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
2013-10-01
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Ahearn, Elizabeth A.
2010-01-01
Multiple linear regression equations for determining flow-duration statistics were developed to estimate select flow exceedances ranging from 25- to 99-percent for six 'bioperiods'-Salmonid Spawning (November), Overwinter (December-February), Habitat Forming (March-April), Clupeid Spawning (May), Resident Spawning (June), and Rearing and Growth (July-October)-in Connecticut. Regression equations also were developed to estimate the 25- and 99-percent flow exceedances without reference to a bioperiod. In total, 32 equations were developed. The predictive equations were based on regression analyses relating flow statistics from streamgages to GIS-determined basin and climatic characteristics for the drainage areas of those streamgages. Thirty-nine streamgages (and an additional 6 short-term streamgages and 28 partial-record sites for the non-bioperiod 99-percent exceedance) in Connecticut and adjacent areas of neighboring States were used in the regression analysis. Weighted least squares regression analysis was used to determine the predictive equations; weights were assigned based on record length. The basin characteristics-drainage area, percentage of area with coarse-grained stratified deposits, percentage of area with wetlands, mean monthly precipitation (November), mean seasonal precipitation (December, January, and February), and mean basin elevation-are used as explanatory variables in the equations. Standard errors of estimate of the 32 equations ranged from 10.7 to 156 percent with medians of 19.2 and 55.4 percent to predict the 25- and 99-percent exceedances, respectively. Regression equations to estimate high and median flows (25- to 75-percent exceedances) are better predictors (smaller variability of the residual values around the regression line) than the equations to estimate low flows (less than 75-percent exceedance). The Habitat Forming (March-April) bioperiod had the smallest standard errors of estimate, ranging from 10.7 to 20.9 percent. In
CSWS-related autistic regression versus autistic regression without CSWS.
Tuchman, Roberto
2009-08-01
Continuous spike-waves during slow-wave sleep (CSWS) and Landau-Kleffner syndrome (LKS) are two clinical epileptic syndromes that are associated with the electroencephalography (EEG) pattern of electrical status epilepticus during slow wave sleep (ESES). Autistic regression occurs in approximately 30% of children with autism and is associated with an epileptiform EEG in approximately 20%. The behavioral phenotypes of CSWS, LKS, and autistic regression overlap. However, the differences in age of regression, degree and type of regression, and frequency of epilepsy and EEG abnormalities suggest that these are distinct phenotypes. CSWS with autistic regression is rare, as is autistic regression associated with ESES. The pathophysiology and as such the treatment implications for children with CSWS and autistic regression are distinct from those with autistic regression without CSWS.
An empirical evaluation of spatial regression models
NASA Astrophysics Data System (ADS)
Gao, Xiaolu; Asami, Yasushi; Chung, Chang-Jo F.
2006-10-01
Conventional statistical methods are often ineffective to evaluate spatial regression models. One reason is that spatial regression models usually have more parameters or smaller sample sizes than a simple model, so their degree of freedom is reduced. Thus, it is often unlikely to evaluate them based on traditional tests. Another reason, which is theoretically associated with statistical methods, is that statistical criteria are crucially dependent on such assumptions as normality, independence, and homogeneity. This may create problems because the assumptions are open for testing. In view of these problems, this paper proposes an alternative empirical evaluation method. To illustrate the idea, a few hedonic regression models for a house and land price data set are evaluated, including a simple, ordinary linear regression model and three spatial models. Their performance as to how well the price of the house and land can be predicted is examined. With a cross-validation technique, the prices at each sample point are predicted with a model estimated with the samples excluding the one being concerned. Then, empirical criteria are established whereby the predicted prices are compared with the real, observed prices. The proposed method provides an objective guidance for the selection of a suitable model specification for a data set. Moreover, the method is seen as an alternative way to test the significance of the spatial relationships being concerned in spatial regression models.
Response-adaptive regression for longitudinal data.
Wu, Shuang; Müller, Hans-Georg
2011-09-01
We propose a response-adaptive model for functional linear regression, which is adapted to sparsely sampled longitudinal responses. Our method aims at predicting response trajectories and models the regression relationship by directly conditioning the sparse and irregular observations of the response on the predictor, which can be of scalar, vector, or functional type. This obliterates the need to model the response trajectories, a task that is challenging for sparse longitudinal data and was previously required for functional regression implementations for longitudinal data. The proposed approach turns out to be superior compared to previous functional regression approaches in terms of prediction error. It encompasses a variety of regression settings that are relevant for the functional modeling of longitudinal data in the life sciences. The improved prediction of response trajectories with the proposed response-adaptive approach is illustrated for a longitudinal study of Kiwi weight growth and by an analysis of the dynamic relationship between viral load and CD4 cell counts observed in AIDS clinical trials. PMID:21133880
Sidorin, Anatoly
2010-01-05
In linear accelerators the particles are accelerated by either electrostatic fields or oscillating Radio Frequency (RF) fields. Accordingly the linear accelerators are divided in three large groups: electrostatic, induction and RF accelerators. Overview of the different types of accelerators is given. Stability of longitudinal and transverse motion in the RF linear accelerators is briefly discussed. The methods of beam focusing in linacs are described.
ADJUSTABLE DOUBLE PULSE GENERATOR
Gratian, J.W.; Gratian, A.C.
1961-08-01
>A modulator pulse source having adjustable pulse width and adjustable pulse spacing is described. The generator consists of a cross coupled multivibrator having adjustable time constant circuitry in each leg, an adjustable differentiating circuit in the output of each leg, a mixing and rectifying circuit for combining the differentiated pulses and generating in its output a resultant sequence of negative pulses, and a final amplifying circuit for inverting and square-topping the pulses. (AEC)
Meteorological adjustment of yearly mean values for air pollutant concentration comparison
NASA Technical Reports Server (NTRS)
Sidik, S. M.; Neustadter, H. E.
1976-01-01
Using multiple linear regression analysis, models which estimate mean concentrations of Total Suspended Particulate (TSP), sulfur dioxide, and nitrogen dioxide as a function of several meteorologic variables, two rough economic indicators, and a simple trend in time are studied. Meteorologic data were obtained and do not include inversion heights. The goodness of fit of the estimated models is partially reflected by the squared coefficient of multiple correlation which indicates that, at the various sampling stations, the models accounted for about 23 to 47 percent of the total variance of the observed TSP concentrations. If the resulting model equations are used in place of simple overall means of the observed concentrations, there is about a 20 percent improvement in either: (1) predicting mean concentrations for specified meteorological conditions; or (2) adjusting successive yearly averages to allow for comparisons devoid of meteorological effects. An application to source identification is presented using regression coefficients of wind velocity predictor variables.
Teaching and hospital production: the use of regression estimates.
Lehner, L A; Burgess, J F
1995-01-01
Medicare's Prospective Payment System pays U.S. teaching hospitals for the indirect costs of medical education based on a regression coefficient in a cost function. In regression studies using health care data, it is common for explanatory variables to be measured imperfectly, yet the potential for measurement error is often ignored. In this paper, U.S. Department of Veterans Affairs data is used to examine issues of health care production estimation and the use of regression estimates like the teaching adjustment factor. The findings show that measurement error and persistent multicollinearity confound attempts to have a large degree of confidence in the precise magnitude of parameter estimates.
NASA Astrophysics Data System (ADS)
Yamamoto, Akira; Yokoya, Kaoru
2015-02-01
An overview of linear collider programs is given. The history and technical challenges are described and the pioneering electron-positron linear collider, the SLC, is first introduced. For future energy frontier linear collider projects, the International Linear Collider (ILC) and the Compact Linear Collider (CLIC) are introduced and their technical features are discussed. The ILC is based on superconducting RF technology and the CLIC is based on two-beam acceleration technology. The ILC collaboration completed the Technical Design Report in 2013, and has come to the stage of "Design to Reality." The CLIC collaboration published the Conceptual Design Report in 2012, and the key technology demonstration is in progress. The prospects for further advanced acceleration technology are briefly discussed for possible long-term future linear colliders.
NASA Astrophysics Data System (ADS)
Yamamoto, Akira; Yokoya, Kaoru
An overview of linear collider programs is given. The history and technical challenges are described and the pioneering electron-positron linear collider, the SLC, is first introduced. For future energy frontier linear collider projects, the International Linear Collider (ILC) and the Compact Linear Collider (CLIC) are introduced and their technical features are discussed. The ILC is based on superconducting RF technology and the CLIC is based on two-beam acceleration technology. The ILC collaboration completed the Technical Design Report in 2013, and has come to the stage of "Design to Reality." The CLIC collaboration published the Conceptual Design Report in 2012, and the key technology demonstration is in progress. The prospects for further advanced acceleration technology are briefly discussed for possible long-term future linear colliders.
Chia, Kim-seng; Abdul Rahim, Herlina; Abdul Rahim, Ruzairi
2012-01-01
Visible and near infrared spectroscopy is a non-destructive, green, and rapid technology that can be utilized to estimate the components of interest without conditioning it, as compared with classical analytical methods. The objective of this paper is to compare the performance of artificial neural network (ANN) (a nonlinear model) and principal component regression (PCR) (a linear model) based on visible and shortwave near infrared (VIS-SWNIR) (400–1000 nm) spectra in the non-destructive soluble solids content measurement of an apple. First, we used multiplicative scattering correction to pre-process the spectral data. Second, PCR was applied to estimate the optimal number of input variables. Third, the input variables with an optimal amount were used as the inputs of both multiple linear regression and ANN models. The initial weights and the number of hidden neurons were adjusted to optimize the performance of ANN. Findings suggest that the predictive performance of ANN with two hidden neurons outperforms that of PCR. PMID:22302428
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Guptill, James D.; Hopkins, Dale A.; Lavelle, Thomas M.
2000-01-01
The NASA Engine Performance Program (NEPP) can configure and analyze almost any type of gas turbine engine that can be generated through the interconnection of a set of standard physical components. In addition, the code can optimize engine performance by changing adjustable variables under a set of constraints. However, for engine cycle problems at certain operating points, the NEPP code can encounter difficulties: nonconvergence in the currently implemented Powell's optimization algorithm and deficiencies in the Newton-Raphson solver during engine balancing. A project was undertaken to correct these deficiencies. Nonconvergence was avoided through a cascade optimization strategy, and deficiencies associated with engine balancing were eliminated through neural network and linear regression methods. An approximation-interspersed cascade strategy was used to optimize the engine's operation over its flight envelope. Replacement of Powell's algorithm by the cascade strategy improved the optimization segment of the NEPP code. The performance of the linear regression and neural network methods as alternative engine analyzers was found to be satisfactory. This report considers two examples-a supersonic mixed-flow turbofan engine and a subsonic waverotor-topped engine-to illustrate the results, and it discusses insights gained from the improved version of the NEPP code.
Bootstrap inference longitudinal semiparametric regression model
NASA Astrophysics Data System (ADS)
Pane, Rahmawati; Otok, Bambang Widjanarko; Zain, Ismaini; Budiantara, I. Nyoman
2016-02-01
Semiparametric regression contains two components, i.e. parametric and nonparametric component. Semiparametric regression model is represented by yt i=μ (x˜'ti,zt i)+εt i where μ (x˜'ti,zt i)=x˜'tiβ ˜+g (zt i) and yti is response variable. It is assumed to have a linear relationship with the predictor variables x˜'ti=(x1 i 1,x2 i 2,…,xT i r) . Random error εti, i = 1, …, n, t = 1, …, T is normally distributed with zero mean and variance σ2 and g(zti) is a nonparametric component. The results of this study showed that the PLS approach on longitudinal semiparametric regression models obtain estimators β˜^t=[X'H(λ)X]-1X'H(λ )y ˜ and g˜^λ(z )=M (λ )y ˜ . The result also show that bootstrap was valid on longitudinal semiparametric regression model with g^λ(b )(z ) as nonparametric component estimator.
Prediction of dynamical systems by symbolic regression.
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K; Noack, Bernd R
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast. PMID:27575130
Prediction of dynamical systems by symbolic regression
NASA Astrophysics Data System (ADS)
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Prediction of dynamical systems by symbolic regression.
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K; Noack, Bernd R
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local
ERIC Educational Resources Information Center
Walkiewicz, T. A.; Newby, N. D., Jr.
1972-01-01
A discussion of linear collisions between two or three objects is related to a junior-level course in analytical mechanics. The theoretical discussion uses a geometrical approach that treats elastic and inelastic collisions from a unified point of view. Experiments with a linear air track are described. (Author/TS)
Quantile regression for climate data
NASA Astrophysics Data System (ADS)
Marasinghe, Dilhani Shalika
Quantile regression is a developing statistical tool which is used to explain the relationship between response and predictor variables. This thesis describes two examples of climatology using quantile regression.Our main goal is to estimate derivatives of a conditional mean and/or conditional quantile function. We introduce a method to handle autocorrelation in the framework of quantile regression and used it with the temperature data. Also we explain some properties of the tornado data which is non-normally distributed. Even though quantile regression provides a more comprehensive view, when talking about residuals with the normality and the constant variance assumption, we would prefer least square regression for our temperature analysis. When dealing with the non-normality and non constant variance assumption, quantile regression is a better candidate for the estimation of the derivative.
Risk-adjusted monitoring of survival times
Sego, Landon H.; Reynolds, Marion R.; Woodall, William H.
2009-02-26
We consider the monitoring of clinical outcomes, where each patient has a di®erent risk of death prior to undergoing a health care procedure.We propose a risk-adjusted survival time CUSUM chart (RAST CUSUM) for monitoring clinical outcomes where the primary endpoint is a continuous, time-to-event variable that may be right censored. Risk adjustment is accomplished using accelerated failure time regression models. We compare the average run length performance of the RAST CUSUM chart to the risk-adjusted Bernoulli CUSUM chart, using data from cardiac surgeries to motivate the details of the comparison. The comparisons show that the RAST CUSUM chart is more efficient at detecting a sudden decrease in the odds of death than the risk-adjusted Bernoulli CUSUM chart, especially when the fraction of censored observations is not too high. We also discuss the implementation of a prospective monitoring scheme using the RAST CUSUM chart.
40 CFR 1066.220 - Linearity verification.
Code of Federal Regulations, 2012 CFR
2012-07-01
... specified in Table 1 of this section. Use the calculations described in 40 CFR 1065.602. Using good...-squares linear regression and the linearity criteria specified in Table 1 of this section. (b) Performance requirements. If a measurement system does not meet the applicable linearity criteria in Table 1 of...
Baseline adjustment increases accurate interpretation of posturographic sway scores.
Tietäväinen, A; Corander, J; Hæggström, E
2015-09-01
Postural steadiness may be quantified using posturographic sway measures. These measures are commonly used to differentiate between a person's baseline balance and balance related to some physiological condition. However, the difference in sway scores between the two conditions may be difficult to detect due to large inter-subject variation. We compared detection accuracy provided by three models that linearly regress a sway measure (mean distance, velocity, or frequency) on the effect of eye closure on balance (eyes open (EO) vs. eyes closed (EC)). In Model 1 the dependent variable is a single sway score (EO or EC), whereas in Models 2 and 3 it is a change score (EO-EO or EC-EO). The independent variable is always the group (group=0: EO or group=1: EC). Model 3 also accounts for the regression to the mean effect (RTM), by considering the baseline value (EO) as a covariate. When differentiating between EO and EC conditions, 94% accuracy can be achieved when using mean velocity as sway measure and either Model 2 or 3. Thus by adjusting for baseline score one increases the accurate interpretation of posturographic sway scores.
Retro-regression--another important multivariate regression improvement.
Randić, M
2001-01-01
We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA. PMID:11410035
Transfer Learning Based on Logistic Regression
NASA Astrophysics Data System (ADS)
Paul, A.; Rottensteiner, F.; Heipke, C.
2015-08-01
In this paper we address the problem of classification of remote sensing images in the framework of transfer learning with a focus on domain adaptation. The main novel contribution is a method for transductive transfer learning in remote sensing on the basis of logistic regression. Logistic regression is a discriminative probabilistic classifier of low computational complexity, which can deal with multiclass problems. This research area deals with methods that solve problems in which labelled training data sets are assumed to be available only for a source domain, while classification is needed in the target domain with different, yet related characteristics. Classification takes place with a model of weight coefficients for hyperplanes which separate features in the transformed feature space. In term of logistic regression, our domain adaptation method adjusts the model parameters by iterative labelling of the target test data set. These labelled data features are iteratively added to the current training set which, at the beginning, only contains source features and, simultaneously, a number of source features are deleted from the current training set. Experimental results based on a test series with synthetic and real data constitutes a first proof-of-concept of the proposed method.
Spatial correlation in Bayesian logistic regression with misclassification.
Bihrmann, Kristine; Toft, Nils; Nielsen, Søren Saxmose; Ersbøll, Annette Kjær
2014-06-01
Standard logistic regression assumes that the outcome is measured perfectly. In practice, this is often not the case, which could lead to biased estimates if not accounted for. This study presents Bayesian logistic regression with adjustment for misclassification of the outcome applied to data with spatial correlation. The models assessed include a fixed effects model, an independent random effects model, and models with spatially correlated random effects modelled using conditional autoregressive prior distributions (ICAR and ICAR(ρ)). Performance of these models was evaluated in a simulation study. Parameters were estimated by Markov Chain Monte Carlo methods, using slice sampling to improve convergence. The results demonstrated that adjustment for misclassification must be included to produce unbiased regression estimates. With strong correlation the ICAR model performed best. With weak or moderate correlation the ICAR(ρ) performed best. With unknown spatial correlation the recommended model would be the ICAR(ρ), assuming convergence can be obtained. PMID:24889989
ERIC Educational Resources Information Center
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Can luteal regression be reversed?
Telleria, Carlos M
2006-01-01
The corpus luteum is an endocrine gland whose limited lifespan is hormonally programmed. This debate article summarizes findings of our research group that challenge the principle that the end of function of the corpus luteum or luteal regression, once triggered, cannot be reversed. Overturning luteal regression by pharmacological manipulations may be of critical significance in designing strategies to improve fertility efficacy. PMID:17074090
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
MM-Estimator and Adjusted Super Smoother based Simultaneous Prediction Confedenc
2002-07-19
A Novel Application of Regression Analysis (MM-Estimator) with Simultaneous Prediction Confidence Intervals are proposed to detect up- or down-regulated genes, which are outliers in scatter plots based on log-transformed red (Cy5 fluorescent dye) versus green (Cy3 fluorescent Dye) intensities. Advantages of the application: 1) Robust and Resistant MM-Estimator is a Reliable Method to Build Linear Regression In the presence of Outliers, 2) Exploratory Data Analysis Tools (Boxplots, Averaged Shifted Histograms, Quantile-Quantile Normal Plots and Scattermore » Plots) are Unsed to Test Visually underlying assumptions of linearity and Contaminated Normality in Microarray data), 3) Simultaneous prediction confidence intervals (SPCIs) Guarantee a desired confidence level across the whole range of the data points used for the scatter plots. Results of the outlier detection procedure is a set of significantly differentially expressed genes extracted from the employed microarray data set. A scatter plot smoother (super smoother or locally weighted regression) is used to quantify heteroscendasticity is residual variance (Commonly takes place in lower and higher intensity areas). The set of differentially expressed genes is quantified using interval estimates for P-values as a probabilistic measure of being outlier by chance. Monte Carlo simultations are used to adjust super smoother-based SPCIs.her.« less
[Regression grading in gastrointestinal tumors].
Tischoff, I; Tannapfel, A
2012-02-01
Preoperative neoadjuvant chemoradiation therapy is a well-established and essential part of the interdisciplinary treatment of gastrointestinal tumors. Neoadjuvant treatment leads to regressive changes in tumors. To evaluate the histological tumor response different scoring systems describing regressive changes are used and known as tumor regression grading. Tumor regression grading is usually based on the presence of residual vital tumor cells in proportion to the total tumor size. Currently, no nationally or internationally accepted grading systems exist. In general, common guidelines should be used in the pathohistological diagnostics of tumors after neoadjuvant therapy. In particularly, the standard tumor grading will be replaced by tumor regression grading. Furthermore, tumors after neoadjuvant treatment are marked with the prefix "y" in the TNM classification. PMID:22293790
Christofilos, N.C.; Polk, I.J.
1959-02-17
Improvements in linear particle accelerators are described. A drift tube system for a linear ion accelerator reduces gap capacity between adjacent drift tube ends. This is accomplished by reducing the ratio of the diameter of the drift tube to the diameter of the resonant cavity. Concentration of magnetic field intensity at the longitudinal midpoint of the external sunface of each drift tube is reduced by increasing the external drift tube diameter at the longitudinal center region.
Ballistic limit curve regression for Freedom Station orbital debris shields
NASA Technical Reports Server (NTRS)
Jolly, William H.; Williamsen, Joel W.
1992-01-01
A procedure utilized at Marshall Space Flight Center to formulate ballistic limit curves for the Space Station Freedom's manned module orbital debris shields is presented. A stepwise linear least squares regression method similar to that employed by Burch (1967) is used to relate a penetration parameter to various projectile and target descriptors. A stepwise regression was also conducted with the model reduced to lower forms, thus eliminating the effects of generalized assumptions.
Yang, Xue; Lauzon, Carolyn B.; Crainiceanu, Ciprian; Caffo, Brian; Resnick, Susan M.; Landman, Bennett A.
2012-01-01
Massively univariate regression and inference in the form of statistical parametric mapping have transformed the way in which multi-dimensional imaging data are studied. In functional and structural neuroimaging, the de facto standard “design matrix”-based general linear regression model and its multi-level cousins have enabled investigation of the biological basis of the human brain. With modern study designs, it is possible to acquire multi-modal three-dimensional assessments of the same individuals — e.g., structural, functional and quantitative magnetic resonance imaging, alongside functional and ligand binding maps with positron emission tomography. Largely, current statistical methods in the imaging community assume that the regressors are non-random. For more realistic multi-parametric assessment (e.g., voxel-wise modeling), distributional consideration of all observations is appropriate. Herein, we discuss two unified regression and inference approaches, model II regression and regression calibration, for use in massively univariate inference with imaging data. These methods use the design matrix paradigm and account for both random and non-random imaging regressors. We characterize these methods in simulation and illustrate their use on an empirical dataset. Both methods have been made readily available as a toolbox plug-in for the SPM software. PMID:22609453
ERIC Educational Resources Information Center
Kaplan, David
2005-01-01
This article considers the problem of estimating dynamic linear regression models when the data are generated from finite mixture probability density function where the mixture components are characterized by different dynamic regression model parameters. Specifically, conventional linear models assume that the data are generated by a single…
The application of quantile regression in autumn precipitation forecasting over Southeastern China
NASA Astrophysics Data System (ADS)
Wu, Baoqiang; Yuan, Huiling
2014-05-01
This study applies the quantile regression method to seasonal forecasts of autumn precipitation over Southeastern China. The dataset includes daily precipitation of 195 gauge stations over Southeastern China, and monthly means of circulation indices, global Sea Surface Temperature (SST), and 500hPa geopotential height. First, using the data from 1961 to 2000 for training, the predictors are chosen by stepwise regression and the prognostic equations of autumn total precipitation are created for each station using the traditional linear regression method. Similarly, the 0.5 quantile regression (median regression) is used to generate the prognostic equations for individual stations. Afterwards, using the data from 2001 to 2007 for validation, the autumn precipitation is forecasted using quantile regression and traditional linear regression respectively. Compared to traditional linear regression, the median regression has better forecast skills in terms of anomaly correlation coefficients, especially in the regions of north Guangxi Province and west Hunan Province. Furthermore, for each station, quantile regression can also estimate a confidence interval of autumn total precipitation using multiple quantiles, providing the range of uncertainties for predicting extreme seasonal precipitation. Keywords: quantile regression, precipitation, linear regression, seasonal forecasts
ERIC Educational Resources Information Center
Story, Roger E.
1996-01-01
Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…
Many regression algorithms, one unified model: A review.
Stulp, Freek; Sigaud, Olivier
2015-09-01
Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. The history of regression is closely related to the history of artificial neural networks since the seminal work of Rosenblatt (1958). The aims of this paper are to provide an overview of many regression algorithms, and to demonstrate how the function representation whose parameters they regress fall into two classes: a weighted sum of basis functions, or a mixture of linear models. Furthermore, we show that the former is a special case of the latter. Our ambition is thus to provide a deep understanding of the relationship between these algorithms, that, despite being derived from very different principles, use a function representation that can be captured within one unified model. Finally, step-by-step derivations of the algorithms from first principles and visualizations of their inner workings allow this article to be used as a tutorial for those new to regression.
Quantiles Regression Approach to Identifying the Determinant of Breastfeeding Duration
NASA Astrophysics Data System (ADS)
Mahdiyah; Norsiah Mohamed, Wan; Ibrahim, Kamarulzaman
In this study, quantiles regression approach is applied to the data of Malaysian Family Life Survey (MFLS), to identify factors which are significantly related to the different conditional quantiles of the breastfeeding duration. It is known that the classical linear regression methods are based on minimizing residual sum of squared, but quantiles regression use a mechanism which are based on the conditional median function and the full range of other conditional quantile functions. Overall, it is found that the period of breastfeeding is significantly related to place of living, religion and total number of children in the family.
Using ridge regression in systematic pointing error corrections
NASA Technical Reports Server (NTRS)
Guiar, C. N.
1988-01-01
A pointing error model is used in the antenna calibration process. Data from spacecraft or radio star observations are used to determine the parameters in the model. However, the regression variables are not truly independent, displaying a condition known as multicollinearity. Ridge regression, a biased estimation technique, is used to combat the multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking (days 105 and 106 of 1987) were analyzed using both linear least squares and ridge regression methods. The advantages and limitations of employing the technique are presented. The problem is not yet fully resolved.
Initial external validation of REGRESS in public health graduate students.
Kidwell, Kelley M; Enders, Felicity B
2014-12-01
Linear regression is typically taught as a second and potentially last required (bio)statistics course for Public Health and Clinical and Translational Science students. There has been much research on the attitudes of students toward basic biostatistics, but there has not been much assessing students' understanding of critical regression topics. The REGRESS (REsearch on Global Regression Expectations in StatisticS) quiz developed at Mayo Clinic utilizes 27 questions to assess understanding for simple and multiple linear regression. We performed an initial external validation of this tool with 117 University of Michigan public health students. We compare the results of pre- and postcourse quiz scores from the Michigan cohort to scores of Mayo medical students and professional statisticians. University of Michigan students performed higher than Mayo students on the precourse quiz due to previous related coursework, but did not perform as high postcourse indicating the need for course modification. In the Michigan cohort, REGRESS scores improved by a mean (standard deviation) of 4.6 (3.4), p < 0.0001. Our results support the use of the REGRESS quiz as a learning tool for students and an evaluation tool to identify topics for curricular improvement for teachers, while we highlight future directions of research. PMID:25041650
Initial external validation of REGRESS in public health graduate students.
Kidwell, Kelley M; Enders, Felicity B
2014-12-01
Linear regression is typically taught as a second and potentially last required (bio)statistics course for Public Health and Clinical and Translational Science students. There has been much research on the attitudes of students toward basic biostatistics, but there has not been much assessing students' understanding of critical regression topics. The REGRESS (REsearch on Global Regression Expectations in StatisticS) quiz developed at Mayo Clinic utilizes 27 questions to assess understanding for simple and multiple linear regression. We performed an initial external validation of this tool with 117 University of Michigan public health students. We compare the results of pre- and postcourse quiz scores from the Michigan cohort to scores of Mayo medical students and professional statisticians. University of Michigan students performed higher than Mayo students on the precourse quiz due to previous related coursework, but did not perform as high postcourse indicating the need for course modification. In the Michigan cohort, REGRESS scores improved by a mean (standard deviation) of 4.6 (3.4), p < 0.0001. Our results support the use of the REGRESS quiz as a learning tool for students and an evaluation tool to identify topics for curricular improvement for teachers, while we highlight future directions of research.
Face Alignment via Regressing Local Binary Features.
Ren, Shaoqing; Cao, Xudong; Wei, Yichen; Sun, Jian
2016-03-01
This paper presents a highly efficient and accurate regression approach for face alignment. Our approach has two novel components: 1) a set of local binary features and 2) a locality principle for learning those features. The locality principle guides us to learn a set of highly discriminative local binary features for each facial landmark independently. The obtained local binary features are used to jointly learn a linear regression for the final output. This approach achieves the state-of-the-art results when tested on the most challenging benchmarks to date. Furthermore, because extracting and regressing local binary features are computationally very cheap, our system is much faster than previous methods. It achieves over 3000 frames per second (FPS) on a desktop or 300 FPS on a mobile phone for locating a few dozens of landmarks. We also study a key issue that is important but has received little attention in the previous research, which is the face detector used to initialize alignment. We investigate several face detectors and perform quantitative evaluation on how they affect alignment accuracy. We find that an alignment friendly detector can further greatly boost the accuracy of our alignment method, reducing the error up to 16% relatively. To facilitate practical usage of face detection/alignment methods, we also propose a convenient metric to measure how good a detector is for alignment initialization.
Regression models of sprint, vertical jump, and change of direction performance.
Swinton, Paul A; Lloyd, Ray; Keogh, Justin W L; Agouris, Ioannis; Stewart, Arthur D
2014-07-01
It was the aim of the present study to expand on previous correlation analyses that have attempted to identify factors that influence performance of jumping, sprinting, and changing direction. This was achieved by using a regression approach to obtain linear models that combined anthropometric, strength, and other biomechanical variables. Thirty rugby union players participated in the study (age: 24.2 ± 3.9 years; stature: 181.2 ± 6.6 cm; mass: 94.2 ± 11.1 kg). The athletes' ability to sprint, jump, and change direction was assessed using a 30-m sprint, vertical jump, and 505 agility test, respectively. Regression variables were collected during maximum strength tests (1 repetition maximum [1RM] deadlift and squat) and performance of fast velocity resistance exercises (deadlift and jump squat) using submaximum loads (10-70% 1RM). Force, velocity, power, and rate of force development (RFD) values were measured during fast velocity exercises with the greatest values produced across loads selected for further analysis. Anthropometric data, including lengths, widths, and girths were collected using a 3-dimensional body scanner. Potential regression variables were first identified using correlation analyses. Suitable variables were then regressed using a best subsets approach. Three factor models generally provided the most appropriate balance between explained variance and model complexity. Adjusted R values of 0.86, 0.82, and 0.67 were obtained for sprint, jump, and change of direction performance, respectively. Anthropometric measurements did not feature in any of the top models because of their strong association with body mass. For each performance measure, variance was best explained by relative maximum strength. Improvements in models were then obtained by including velocity and power values for jumping and sprinting performance, and by including RFD values for change of direction performance. PMID:24345969
Regional regression of flood characteristics employing historical information
Tasker, Gary D.; Stedinger, J.R.
1987-01-01
Streamflow gauging networks provide hydrologic information for use in estimating the parameters of regional regression models. The regional regression models can be used to estimate flood statistics, such as the 100 yr peak, at ungauged sites as functions of drainage basin characteristics. A recent innovation in regional regression is the use of a generalized least squares (GLS) estimator that accounts for unequal station record lengths and sample cross correlation among the flows. However, this technique does not account for historical flood information. A method is proposed here to adjust this generalized least squares estimator to account for possible information about historical floods available at some stations in a region. The historical information is assumed to be in the form of observations of all peaks above a threshold during a long period outside the systematic record period. A Monte Carlo simulation experiment was performed to compare the GLS estimator adjusted for historical floods with the unadjusted GLS estimator and the ordinary least squares estimator. Results indicate that using the GLS estimator adjusted for historical information significantly improves the regression model. ?? 1987.
Multiple Regression and Its Discontents
ERIC Educational Resources Information Center
Snell, Joel C.; Marsh, Mitchell
2012-01-01
Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.
Regression methods for spatial data
NASA Technical Reports Server (NTRS)
Yakowitz, S. J.; Szidarovszky, F.
1982-01-01
The kriging approach, a parametric regression method used by hydrologists and mining engineers, among others also provides an error estimate the integral of the regression function. The kriging method is explored and some of its statistical characteristics are described. The Watson method and theory are extended so that the kriging features are displayed. Theoretical and computational comparisons of the kriging and Watson approaches are offered.
Wrong Signs in Regression Coefficients
NASA Technical Reports Server (NTRS)
McGee, Holly
1999-01-01
When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.
Basis Selection for Wavelet Regression
NASA Technical Reports Server (NTRS)
Wheeler, Kevin R.; Lau, Sonie (Technical Monitor)
1998-01-01
A wavelet basis selection procedure is presented for wavelet regression. Both the basis and the threshold are selected using cross-validation. The method includes the capability of incorporating prior knowledge on the smoothness (or shape of the basis functions) into the basis selection procedure. The results of the method are demonstrated on sampled functions widely used in the wavelet regression literature. The results of the method are contrasted with other published methods.
Article mounting and position adjustment stage
Cutburth, R.W.; Silva, L.L.
1988-05-10
An improved adjustment and mounting stage of the type used for the detection of laser beams is disclosed. A ring sensor holder has locating pins on a first side thereof which are positioned within a linear keyway in a surrounding housing for permitting reciprocal movement of the ring along the keyway. A rotatable ring gear is positioned within the housing on the other side of the ring from the linear keyway and includes an oval keyway which drives the ring along the linear keyway upon rotation of the gear. Motor-driven single-stage and dual (x, y) stage adjustment systems are disclosed which are of compact construction and include a large laser transmission hole. 6 figs.
Article mounting and position adjustment stage
Cutburth, Ronald W.; Silva, Leonard L.
1988-01-01
An improved adjustment and mounting stage of the type used for the detection of laser beams is disclosed. A ring sensor holder has locating pins on a first side thereof which are positioned within a linear keyway in a surrounding housing for permitting reciprocal movement of the ring along the keyway. A rotatable ring gear is positioned within the housing on the other side of the ring from the linear keyway and includes an oval keyway which drives the ring along the linear keyway upon rotation of the gear. Motor-driven single-stage and dual (x, y) stage adjustment systems are disclosed which are of compact construction and include a large laser transmission hole.
Regression Discontinuity Designs in Epidemiology
Moscoe, Ellen; Mutevedzi, Portia; Newell, Marie-Louise; Bärnighausen, Till
2014-01-01
When patients receive an intervention based on whether they score below or above some threshold value on a continuously measured random variable, the intervention will be randomly assigned for patients close to the threshold. The regression discontinuity design exploits this fact to estimate causal treatment effects. In spite of its recent proliferation in economics, the regression discontinuity design has not been widely adopted in epidemiology. We describe regression discontinuity, its implementation, and the assumptions required for causal inference. We show that regression discontinuity is generalizable to the survival and nonlinear models that are mainstays of epidemiologic analysis. We then present an application of regression discontinuity to the much-debated epidemiologic question of when to start HIV patients on antiretroviral therapy. Using data from a large South African cohort (2007–2011), we estimate the causal effect of early versus deferred treatment eligibility on mortality. Patients whose first CD4 count was just below the 200 cells/μL CD4 count threshold had a 35% lower hazard of death (hazard ratio = 0.65 [95% confidence interval = 0.45–0.94]) than patients presenting with CD4 counts just above the threshold. We close by discussing the strengths and limitations of regression discontinuity designs for epidemiology. PMID:25061922
McKenzie, K.R.
1959-07-01
An electrode support which permits accurate alignment and adjustment of the electrode in a plurality of planes and about a plurality of axes in a calutron is described. The support will align the slits in the electrode with the slits of an ionizing chamber so as to provide for the egress of ions. The support comprises an insulator, a leveling plate carried by the insulator and having diametrically opposed attaching screws screwed to the plate and the insulator and diametrically opposed adjusting screws for bearing against the insulator, and an electrode associated with the plate for adjustment therewith.
Linear equality constraints in the general linear mixed model.
Edwards, L J; Stewart, P W; Muller, K E; Helms, R W
2001-12-01
Scientists may wish to analyze correlated outcome data with constraints among the responses. For example, piecewise linear regression in a longitudinal data analysis can require use of a general linear mixed model combined with linear parameter constraints. Although well developed for standard univariate models, there are no general results that allow a data analyst to specify a mixed model equation in conjunction with a set of constraints on the parameters. We resolve the difficulty by precisely describing conditions that allow specifying linear parameter constraints that insure the validity of estimates and tests in a general linear mixed model. The recommended approach requires only straightforward and noniterative calculations to implement. We illustrate the convenience and advantages of the methods with a comparison of cognitive developmental patterns in a study of individuals from infancy to early adulthood for children from low-income families.
Attachment style and adjustment to divorce.
Yárnoz-Yaben, Sagrario
2010-05-01
Divorce is becoming increasingly widespread in Europe. In this study, I present an analysis of the role played by attachment style (secure, dismissing, preoccupied and fearful, plus the dimensions of anxiety and avoidance) in the adaptation to divorce. Participants comprised divorced parents (N = 40) from a medium-sized city in the Basque Country. The results reveal a lower proportion of people with secure attachment in the sample group of divorcees. Attachment style and dependence (emotional and instrumental) are closely related. I have also found associations between measures that showed a poor adjustment to divorce and the preoccupied and fearful attachment styles. Adjustment is related to a dismissing attachment style and to the avoidance dimension. Multiple regression analysis confirmed that secure attachment and the avoidance dimension predict adjustment to divorce and positive affectivity while preoccupied attachment and the anxiety dimension predicted negative affectivity. Implications for research and interventions with divorcees are discussed.
Shrinkage regression-based methods for microarray missing value imputation
2013-01-01
Background Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. Results To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Conclusions Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods. PMID:24565159
Kokuhu, Takatoshi; Fukushima, Keizo; Ushigome, Hidetaka; Yoshimura, Norio; Sugioka, Nobuyuki
2013-01-01
The optimal use and monitoring of cyclosporine A (CyA) have remained unclear and the current strategy of CyA treatment requires frequent dose adjustment following an empirical initial dosage adjusted for total body weight (TBW). The primary aim of this study was to evaluate age and anthropometric parameters as predictors for dose adjustment of CyA; and the secondary aim was to compare the usefulness of the concentration at predose (C0) and 2-hour postdose (C2) monitoring. An open-label, non-randomized, retrospective study was performed in 81 renal transplant patients in Japan during 2001-2010. The relationships between the area under the blood concentration-time curve (AUC0-9) of CyA and its C0 or C2 level were assessed with a linear regression analysis model. In addition to age, 7 anthropometric parameters were tested as predictors for AUC0-9 of CyA: TBW, height (HT), body mass index (BMI), body surface area (BSA), ideal body weight (IBW), lean body weight (LBW), and fat free mass (FFM). Correlations between AUC0-9 of CyA and these parameters were also analyzed with a linear regression model. The rank order of the correlation coefficient was C0 > C2 (C0; r=0.6273, C2; r=0.5562). The linear regression analyses between AUC0-9 of CyA and candidate parameters indicated their potential usefulness from the following rank order: IBW > FFM > HT > BSA > LBW > TBW > BMI > Age. In conclusion, after oral administration, C2 monitoring has a large variation and could be at high risk for overdosing. Therefore, after oral dosing of CyA, it was not considered to be a useful approach for single monitoring, but should rather be used with C0 monitoring. The regression analyses between AUC0-9 of CyA and anthropometric parameters indicated that IBW was potentially the superior predictor for dose adjustment of CyA in an empiric strategy using TBW (IBW; r=0.5181, TBW; r=0.3192); however, this finding seems to lack the pharmacokinetic rationale and thus warrants further basic and clinical
Remotely Adjustable Hydraulic Pump
NASA Technical Reports Server (NTRS)
Kouns, H. H.; Gardner, L. D.
1987-01-01
Outlet pressure adjusted to match varying loads. Electrohydraulic servo has positioned sleeve in leftmost position, adjusting outlet pressure to maximum value. Sleeve in equilibrium position, with control land covering control port. For lowest pressure setting, sleeve shifted toward right by increased pressure on sleeve shoulder from servovalve. Pump used in aircraft and robots, where hydraulic actuators repeatedly turned on and off, changing pump load frequently and over wide range.
Weighted triangulation adjustment
Anderson, Walter L.
1969-01-01
The variation of coordinates method is employed to perform a weighted least squares adjustment of horizontal survey networks. Geodetic coordinates are required for each fixed and adjustable station. A preliminary inverse geodetic position computation is made for each observed line. Weights associated with each observed equation for direction, azimuth, and distance are applied in the formation of the normal equations in-the least squares adjustment. The number of normal equations that may be solved is twice the number of new stations and less than 150. When the normal equations are solved, shifts are produced at adjustable stations. Previously computed correction factors are applied to the shifts and a most probable geodetic position is found for each adjustable station. Pinal azimuths and distances are computed. These may be written onto magnetic tape for subsequent computation of state plane or grid coordinates. Input consists of punch cards containing project identification, program options, and position and observation information. Results listed include preliminary and final positions, residuals, observation equations, solution of the normal equations showing magnitudes of shifts, and a plot of each adjusted and fixed station. During processing, data sets containing irrecoverable errors are rejected and the type of error is listed. The computer resumes processing of additional data sets.. Other conditions cause warning-errors to be issued, and processing continues with the current data set.
GENERALIZED PARTIALLY LINEAR MIXED-EFFECTS MODELS INCORPORATING MISMEASURED COVARIATES
Liang, Hua
2009-01-01
In this article we consider a semiparametric generalized mixed-effects model, and propose combining local linear regression, and penalized quasilikelihood and local quasilikelihood techniques to estimate both population and individual parameters and nonparametric curves. The proposed estimators take into account the local correlation structure of the longitudinal data. We establish normality for the estimators of the parameter and asymptotic expansion for the estimators of the nonparametric part. For practical implementation, we propose an appropriate algorithm. We also consider the measurement error problem in covariates in our model, and suggest a strategy for adjusting the effects of measurement errors. We apply the proposed models and methods to study the relation between virologic and immunologic responses in AIDS clinical trials, in which virologic response is classified into binary variables. A dataset from an AIDS clinical study is analyzed. PMID:20160899
Meta-Regression Approximations to Reduce Publication Selection Bias
ERIC Educational Resources Information Center
Stanley, T. D.; Doucouliagos, Hristos
2014-01-01
Publication selection bias is a serious challenge to the integrity of all empirical sciences. We derive meta-regression approximations to reduce this bias. Our approach employs Taylor polynomial approximations to the conditional mean of a truncated distribution. A quadratic approximation without a linear term, precision-effect estimate with…
Logarithmic Transformations in Regression: Do You Transform Back Correctly?
ERIC Educational Resources Information Center
Dambolena, Ismael G.; Eriksen, Steven E.; Kopcso, David P.
2009-01-01
The logarithmic transformation is often used in regression analysis for a variety of purposes such as the linearization of a nonlinear relationship between two or more variables. We have noticed that when this transformation is applied to the response variable, the computation of the point estimate of the conditional mean of the original response…
Notes sur les mouvements recursifs (Notes on Regressive Moves).
ERIC Educational Resources Information Center
Auchlin, Antoine; And Others
1981-01-01
Examines the phenomenon of regressive moves (retro-interpretation) in the light of a hypothesis according to which the formation of complex and hierarchically organized conversation units is subordinated to the linearity of discourse. Analyzes a transactional exchange, describing the interplay of integration, anticipation, and retro-interpretation…
A Demonstration of Regression False Positive Selection in Data Mining
ERIC Educational Resources Information Center
Pinder, Jonathan P.
2014-01-01
Business analytics courses, such as marketing research, data mining, forecasting, and advanced financial modeling, have substantial predictive modeling components. The predictive modeling in these courses requires students to estimate and test many linear regressions. As a result, false positive variable selection ("type I errors") is…
Multiple regression for physiological data analysis: the problem of multicollinearity.
Slinker, B K; Glantz, S A
1985-07-01
Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
Interpretation of Standardized Regression Coefficients in Multiple Regression.
ERIC Educational Resources Information Center
Thayer, Jerome D.
The extent to which standardized regression coefficients (beta values) can be used to determine the importance of a variable in an equation was explored. The beta value and the part correlation coefficient--also called the semi-partial correlation coefficient and reported in squared form as the incremental "r squared"--were compared for variables…
Regressive Evolution in Astyanax Cavefish
Jeffery, William R.
2013-01-01
A diverse group of animals, including members of most major phyla, have adapted to life in the perpetual darkness of caves. These animals are united by the convergence of two regressive phenotypes, loss of eyes and pigmentation. The mechanisms of regressive evolution are poorly understood. The teleost Astyanax mexicanus is of special significance in studies of regressive evolution in cave animals. This species includes an ancestral surface dwelling form and many con-specific cave-dwelling forms, some of which have evolved their recessive phenotypes independently. Recent advances in Astyanax development and genetics have provided new information about how eyes and pigment are lost during cavefish evolution; namely, they have revealed some of the molecular and cellular mechanisms involved in trait modification, the number and identity of the underlying genes and mutations, the molecular basis of parallel evolution, and the evolutionary forces driving adaptation to the cave environment. PMID:19640230
Laplace regression with censored data.
Bottai, Matteo; Zhang, Jiajia
2010-08-01
We consider a regression model where the error term is assumed to follow a type of asymmetric Laplace distribution. We explore its use in the estimation of conditional quantiles of a continuous outcome variable given a set of covariates in the presence of random censoring. Censoring may depend on covariates. Estimation of the regression coefficients is carried out by maximizing a non-differentiable likelihood function. In the scenarios considered in a simulation study, the Laplace estimator showed correct coverage and shorter computation time than the alternative methods considered, some of which occasionally failed to converge. We illustrate the use of Laplace regression with an application to survival time in patients with small cell lung cancer.
Interquantile Shrinkage in Regression Models
Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.
2012-01-01
Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546
[Is regression of atherosclerosis possible?].
Thomas, D; Richard, J L; Emmerich, J; Bruckert, E; Delahaye, F
1992-10-01
Experimental studies have shown the regression of atherosclerosis in animals given a cholesterol-rich diet and then given a normal diet or hypolipidemic therapy. Despite favourable results of clinical trials of primary prevention modifying the lipid profile, the concept of atherosclerosis regression in man remains very controversial. The methodological approach is difficult: this is based on angiographic data and requires strict standardisation of angiographic views and reliable quantitative techniques of analysis which are available with image processing. Several methodologically acceptable clinical coronary studies have shown not only stabilisation but also regression of atherosclerotic lesions with reductions of about 25% in total cholesterol levels and of about 40% in LDL cholesterol levels. These reductions were obtained either by drugs as in CLAS (Cholesterol Lowering Atherosclerosis Study), FATS (Familial Atherosclerosis Treatment Study) and SCOR (Specialized Center of Research Intervention Trial), by profound modifications in dietary habits as in the Lifestyle Heart Trial, or by surgery (ileo-caecal bypass) as in POSCH (Program On the Surgical Control of the Hyperlipidemias). On the other hand, trials with non-lipid lowering drugs such as the calcium antagonists (INTACT, MHIS) have not shown significant regression of existing atherosclerotic lesions but only a decrease on the number of new lesions. The clinical benefits of these regression studies are difficult to demonstrate given the limited period of observation, relatively small population numbers and the fact that in some cases the subjects were asymptomatic. The decrease in the number of cardiovascular events therefore seems relatively modest and concerns essentially subjects who were symptomatic initially. The clinical repercussion of studies of prevention involving a single lipid factor is probably partially due to the reduction in progression and anatomical regression of the atherosclerotic plaque
Weaver, Virginia M.; Vargas, Gonzalo García; Silbergeld, Ellen K.; Rothenberg, Stephen J.; Fadrowski, Jeffrey J.; Rubio-Andrade, Marisela; Parsons, Patrick J.; Steuerwald, Amy J.; and others
2014-07-15
Positive associations between urine toxicant levels and measures of glomerular filtration rate (GFR) have been reported recently in a range of populations. The explanation for these associations, in a direction opposite that of traditional nephrotoxicity, is uncertain. Variation in associations by urine concentration adjustment approach has also been observed. Associations of urine cadmium, thallium and uranium in models of serum creatinine- and cystatin-C-based estimated GFR (eGFR) were examined using multiple linear regression in a cross-sectional study of adolescents residing near a lead smelter complex. Urine concentration adjustment approaches compared included urine creatinine, urine osmolality and no adjustment. Median age, blood lead and urine cadmium, thallium and uranium were 13.9 years, 4.0 μg/dL, 0.22, 0.27 and 0.04 g/g creatinine, respectively, in 512 adolescents. Urine cadmium and thallium were positively associated with serum creatinine-based eGFR only when urine creatinine was used to adjust for urine concentration (β coefficient=3.1 mL/min/1.73 m{sup 2}; 95% confidence interval=1.4, 4.8 per each doubling of urine cadmium). Weaker positive associations, also only with urine creatinine adjustment, were observed between these metals and serum cystatin-C-based eGFR and between urine uranium and serum creatinine-based eGFR. Additional research using non-creatinine-based methods of adjustment for urine concentration is necessary. - Highlights: • Positive associations between urine metals and creatinine-based eGFR are unexpected. • Optimal approach to urine concentration adjustment for urine biomarkers uncertain. • We compared urine concentration adjustment methods. • Positive associations observed only with urine creatinine adjustment. • Additional research using non-creatinine-based methods of adjustment needed.
Generalized Linear Models in Family Studies
ERIC Educational Resources Information Center
Wu, Zheng
2005-01-01
Generalized linear models (GLMs), as defined by J. A. Nelder and R. W. M. Wedderburn (1972), unify a class of regression models for categorical, discrete, and continuous response variables. As an extension of classical linear models, GLMs provide a common body of theory and methodology for some seemingly unrelated models and procedures, such as…
Colgate, S.A.
1958-05-27
An improvement is presented in linear accelerators for charged particles with respect to the stable focusing of the particle beam. The improvement consists of providing a radial electric field transverse to the accelerating electric fields and angularly introducing the beam of particles in the field. The results of the foregoing is to achieve a beam which spirals about the axis of the acceleration path. The combination of the electric fields and angular motion of the particles cooperate to provide a stable and focused particle beam.
Urinary arsenic concentration adjustment factors and malnutrition.
Nermell, Barbro; Lindberg, Anna-Lena; Rahman, Mahfuzar; Berglund, Marika; Persson, Lars Ake; El Arifeen, Shams; Vahter, Marie
2008-02-01
This study aims at evaluating the suitability of adjusting urinary concentrations of arsenic, or any other urinary biomarker, for variations in urine dilution by creatinine and specific gravity in a malnourished population. We measured the concentrations of metabolites of inorganic arsenic, creatinine and specific gravity in spot urine samples collected from 1466 individuals, 5-88 years of age, in Matlab, rural Bangladesh, where arsenic-contaminated drinking water and malnutrition are prevalent (about 30% of the adults had body mass index (BMI) below 18.5 kg/m(2)). The urinary concentrations of creatinine were low; on average 0.55 g/L in the adolescents and adults and about 0.35 g/L in the 5-12 years old children. Therefore, adjustment by creatinine gave much higher numerical values for the urinary arsenic concentrations than did the corresponding data expressed as microg/L, adjusted by specific gravity. As evaluated by multiple regression analyses, urinary creatinine, adjusted by specific gravity, was more affected by body size, age, gender and season than was specific gravity. Furthermore, urinary creatinine was found to be significantly associated with urinary arsenic, which further disqualifies the creatinine adjustment. PMID:17900556
ERIC Educational Resources Information Center
Dolan, Conor V.; Wicherts, Jelte M.; Molenaar, Peter C. M.
2004-01-01
We consider the question of how variation in the number and reliability of indicators affects the power to reject the hypothesis that the regression coefficients are zero in latent linear regression analysis. We show that power remains constant as long as the coefficient of determination remains unchanged. Any increase in the number of indicators…
Weighting Regressions by Propensity Scores
ERIC Educational Resources Information Center
Freedman, David A.; Berk, Richard A.
2008-01-01
Regressions can be weighted by propensity scores in order to reduce bias. However, weighting is likely to increase random error in the estimates, and to bias the estimated standard errors downward, even when selection mechanisms are well understood. Moreover, in some cases, weighting will increase the bias in estimated causal parameters. If…
Multiple Regression: A Leisurely Primer.
ERIC Educational Resources Information Center
Daniel, Larry G.; Onwuegbuzie, Anthony J.
Multiple regression is a useful statistical technique when the researcher is considering situations in which variables of interest are theorized to be multiply caused. It may also be useful in those situations in which the researchers is interested in studies of predictability of phenomena of interest. This paper provides an introduction to…
Quantile Regression with Censored Data
ERIC Educational Resources Information Center
Lin, Guixian
2009-01-01
The Cox proportional hazards model and the accelerated failure time model are frequently used in survival data analysis. They are powerful, yet have limitation due to their model assumptions. Quantile regression offers a semiparametric approach to model data with possible heterogeneity. It is particularly powerful for censored responses, where the…
Support vector machines for classification and regression.
Brereton, Richard G; Lloyd, Gavin R
2010-02-01
The increasing interest in Support Vector Machines (SVMs) over the past 15 years is described. Methods are illustrated using simulated case studies, and 4 experimental case studies, namely mass spectrometry for studying pollution, near infrared analysis of food, thermal analysis of polymers and UV/visible spectroscopy of polyaromatic hydrocarbons. The basis of SVMs as two-class classifiers is shown with extensive visualisation, including learning machines, kernels and penalty functions. The influence of the penalty error and radial basis function radius on the model is illustrated. Multiclass implementations including one vs. all, one vs. one, fuzzy rules and Directed Acyclic Graph (DAG) trees are described. One-class Support Vector Domain Description (SVDD) is described and contrasted to conventional two- or multi-class classifiers. The use of Support Vector Regression (SVR) is illustrated including its application to multivariate calibration, and why it is useful when there are outliers and non-linearities.
NASA Technical Reports Server (NTRS)
2006-01-01
[figure removed for brevity, see original site] Context image for PIA03667 Linear Clouds
These clouds are located near the edge of the south polar region. The cloud tops are the puffy white features in the bottom half of the image.
Image information: VIS instrument. Latitude -80.1N, Longitude 52.1E. 17 meter/pixel resolution.
Note: this THEMIS visual image has not been radiometrically nor geometrically calibrated for this preliminary release. An empirical correction has been performed to remove instrumental effects. A linear shift has been applied in the cross-track and down-track direction to approximate spacecraft and planetary motion. Fully calibrated and geometrically projected images will be released through the Planetary Data System in accordance with Project policies at a later time.
NASA's Jet Propulsion Laboratory manages the 2001 Mars Odyssey mission for NASA's Office of Space Science, Washington, D.C. The Thermal Emission Imaging System (THEMIS) was developed by Arizona State University, Tempe, in collaboration with Raytheon Santa Barbara Remote Sensing. The THEMIS investigation is led by Dr. Philip Christensen at Arizona State University. Lockheed Martin Astronautics, Denver, is the prime contractor for the Odyssey project, and developed and built the orbiter. Mission operations are conducted jointly from Lockheed Martin and from JPL, a division of the California Institute of Technology in Pasadena.
Calculation of Solar Radiation by Using Regression Methods
NASA Astrophysics Data System (ADS)
Kızıltan, Ö.; Şahin, M.
2016-04-01
In this study, solar radiation was estimated at 53 location over Turkey with varying climatic conditions using the Linear, Ridge, Lasso, Smoother, Partial least, KNN and Gaussian process regression methods. The data of 2002 and 2003 years were used to obtain regression coefficients of relevant methods. The coefficients were obtained based on the input parameters. Input parameters were month, altitude, latitude, longitude and landsurface temperature (LST).The values for LST were obtained from the data of the National Oceanic and Atmospheric Administration Advanced Very High Resolution Radiometer (NOAA-AVHRR) satellite. Solar radiation was calculated using obtained coefficients in regression methods for 2004 year. The results were compared statistically. The most successful method was Gaussian process regression method. The most unsuccessful method was lasso regression method. While means bias error (MBE) value of Gaussian process regression method was 0,274 MJ/m2, root mean square error (RMSE) value of method was calculated as 2,260 MJ/m2. The correlation coefficient of related method was calculated as 0,941. Statistical results are consistent with the literature. Used the Gaussian process regression method is recommended for other studies.
Mapping geogenic radon potential by regression kriging.
Pásztor, László; Szabó, Katalin Zsuzsanna; Szatmári, Gábor; Laborczi, Annamária; Horváth, Ákos
2016-02-15
Radon ((222)Rn) gas is produced in the radioactive decay chain of uranium ((238)U) which is an element that is naturally present in soils. Radon is transported mainly by diffusion and convection mechanisms through the soil depending mainly on the physical and meteorological parameters of the soil and can enter and accumulate in buildings. Health risks originating from indoor radon concentration can be attributed to natural factors and is characterized by geogenic radon potential (GRP). Identification of areas with high health risks require spatial modeling, that is, mapping of radon risk. In addition to geology and meteorology, physical soil properties play a significant role in the determination of GRP. In order to compile a reliable GRP map for a model area in Central-Hungary, spatial auxiliary information representing GRP forming environmental factors were taken into account to support the spatial inference of the locally measured GRP values. Since the number of measured sites was limited, efficient spatial prediction methodologies were searched for to construct a reliable map for a larger area. Regression kriging (RK) was applied for the interpolation using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly, the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Overall accuracy of the map was tested by Leave-One-Out Cross-Validation. Furthermore the spatial reliability of the resultant map is also estimated by the calculation of the 90% prediction interval of the local prediction values. The applicability of the applied method as well as that of the map is discussed briefly. PMID:26706761
Monthly streamflow forecasting using Gaussian Process Regression
NASA Astrophysics Data System (ADS)
Sun, Alexander Y.; Wang, Dingbao; Xu, Xianli
2014-04-01
Streamflow forecasting plays a critical role in nearly all aspects of water resources planning and management. In this work, Gaussian Process Regression (GPR), an effective kernel-based machine learning algorithm, is applied to probabilistic streamflow forecasting. GPR is built on Gaussian process, which is a stochastic process that generalizes multivariate Gaussian distribution to infinite-dimensional space such that distributions over function values can be defined. The GPR algorithm provides a tractable and flexible hierarchical Bayesian framework for inferring the posterior distribution of streamflows. The prediction skill of the algorithm is tested for one-month-ahead prediction using the MOPEX database, which includes long-term hydrometeorological time series collected from 438 basins across the U.S. from 1948 to 2003. Comparisons with linear regression and artificial neural network models indicate that GPR outperforms both regression methods in most cases. The GPR prediction of MOPEX basins is further examined using the Budyko framework, which helps to reveal the close relationships among water-energy partitions, hydrologic similarity, and predictability. Flow regime modification and the resulting loss of predictability have been a major concern in recent years because of climate change and anthropogenic activities. The persistence of streamflow predictability is thus examined by extending the original MOPEX data records to 2012. Results indicate relatively strong persistence of streamflow predictability in the extended period, although the low-predictability basins tend to show more variations. Because many low-predictability basins are located in regions experiencing fast growth of human activities, the significance of sustainable development and water resources management can be even greater for those regions.
Mapping geogenic radon potential by regression kriging.
Pásztor, László; Szabó, Katalin Zsuzsanna; Szatmári, Gábor; Laborczi, Annamária; Horváth, Ákos
2016-02-15
Radon ((222)Rn) gas is produced in the radioactive decay chain of uranium ((238)U) which is an element that is naturally present in soils. Radon is transported mainly by diffusion and convection mechanisms through the soil depending mainly on the physical and meteorological parameters of the soil and can enter and accumulate in buildings. Health risks originating from indoor radon concentration can be attributed to natural factors and is characterized by geogenic radon potential (GRP). Identification of areas with high health risks require spatial modeling, that is, mapping of radon risk. In addition to geology and meteorology, physical soil properties play a significant role in the determination of GRP. In order to compile a reliable GRP map for a model area in Central-Hungary, spatial auxiliary information representing GRP forming environmental factors were taken into account to support the spatial inference of the locally measured GRP values. Since the number of measured sites was limited, efficient spatial prediction methodologies were searched for to construct a reliable map for a larger area. Regression kriging (RK) was applied for the interpolation using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly, the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Overall accuracy of the map was tested by Leave-One-Out Cross-Validation. Furthermore the spatial reliability of the resultant map is also estimated by the calculation of the 90% prediction interval of the local prediction values. The applicability of the applied method as well as that of the map is discussed briefly.
Shape regression for vertebra fracture quantification
NASA Astrophysics Data System (ADS)
Lund, Michael Tillge; de Bruijne, Marleen; Tanko, Laszlo B.; Nielsen, Mads
2005-04-01
Accurate and reliable identification and quantification of vertebral fractures constitute a challenge both in clinical trials and in diagnosis of osteoporosis. Various efforts have been made to develop reliable, objective, and reproducible methods for assessing vertebral fractures, but at present there is no consensus concerning a universally accepted diagnostic definition of vertebral fractures. In this project we want to investigate whether or not it is possible to accurately reconstruct the shape of a normal vertebra, using a neighbouring vertebra as prior information. The reconstructed shape can then be used to develop a novel vertebra fracture measure, by comparing the segmented vertebra shape with its reconstructed normal shape. The vertebrae in lateral x-rays of the lumbar spine were manually annotated by a medical expert. With this dataset we built a shape model, with equidistant point distribution between the four corner points. Based on the shape model, a multiple linear regression model of a normal vertebra shape was developed for each dataset using leave-one-out cross-validation. The reconstructed shape was calculated for each dataset using these regression models. The average prediction error for the annotated shape was on average 3%.
Regression Analysis Of Zernike Polynomials Part II
NASA Astrophysics Data System (ADS)
Grey, Louis D.
1989-01-01
In an earlier paper entitled "Regression Analysis of Zernike Polynomials, Proceedings of SPIE, Vol. 18, pp. 392-398, the least squares fitting process of Zernike polynomials was examined from the point of view of linear statistical regression theory. Among the topics discussed were measures for determining how good the fit was, tests for the underlying assumptions of normality and constant variance, the treatment of outliers, the analysis of residuals and the computation of confidence intervals for the coefficients. The present paper is a continuation of the earlier paper and concerns applications of relatively new advances in certain areas of statistical theory made possible by the advent of the high speed computer. Among these are: 1. Jackknife - A technique for improving the accuracy of any statistical estimate. 2. Bootstrap - Increasing the accuracy of an estimate by generating new samples of data from some given set. 3. Cross-validation - The division of a data set into two halves, the first half of which is used to fit the model and the second half to see how well the fitted model predicts the data. The exposition is mainly by examples.
Path Linearity of Elite Swimmers in a 400 m Front Crawl Competition
Gatta, Giorgio; Cortesi, Matteo; Lucertini, Francesco; Piero, Benelli; Sisti, Davide; Fantozzi, Silvia
2015-01-01
In the frontal crawl, the propulsive action of the limbs causes lateral fluctuations from the straight path, which can be theoretically seen as the best time saving path of the race. The purpose of the present work was to analyze the head trajectory of 10 elite athletes, during a competition of 400 m front crawl, in order to give information regarding the path linearity of elite swimmers. The kinematic analysis of the head trajectories was performed by means of stereo-photogrammetry. Results showed that the forward speed and lateral fluctuations speed are linearly related. Multiple regression analysis of discrete Fourier transformation allowed to distinguish 3 spectral windows identifying 3 specific features: strokes (0.7-5 Hz), breathings (0.4-0.7 Hz), and voluntary adjustments (0-0.4 Hz), which contributed to the energy wasting for 55%, 10%, and 35%, respectively. Both elite swimmers race speed and speed wastage increase while progressing from the 1st to the 8th length during a 400 m front crawl official competition. The main sources of the lateral fluctuations that lead to the increasing speed wastage could be significantly attributed to strokes and voluntary adjustments, while breathings contribution did not reach statistical significance. In conclusion, both strokes and voluntary adjustments are the main energy consuming events that affect path linearity. Key points The lateral fluctuations (LF) represent indexes of elite performance swimmers during 400 m competitions. The voluntary adjustments needed to go back to the ideal trajectory are more energy consuming than the movements of the swimmer for maintaining the path linearity. The diverge from the ideal swimming trajectory during a high level competition explain about 14.7% of the variations of the average forward velocity during the race. PMID:25729292
Deep Wavelet Scattering for Quantum Energy Regression
NASA Astrophysics Data System (ADS)
Hirn, Matthew
Physical functionals are usually computed as solutions of variational problems or from solutions of partial differential equations, which may require huge computations for complex systems. Quantum chemistry calculations of ground state molecular energies is such an example. Indeed, if x is a quantum molecular state, then the ground state energy E0 (x) is the minimum eigenvalue solution of the time independent Schrödinger Equation, which is computationally intensive for large systems. Machine learning algorithms do not simulate the physical system but estimate solutions by interpolating values provided by a training set of known examples {(xi ,E0 (xi) } i <= n . However, precise interpolations may require a number of examples that is exponential in the system dimension, and are thus intractable. This curse of dimensionality may be circumvented by computing interpolations in smaller approximation spaces, which take advantage of physical invariants. Linear regressions of E0 over a dictionary Φ ={ϕk } k compute an approximation E 0 as: E 0 (x) =∑kwkϕk (x) , where the weights {wk } k are selected to minimize the error between E0 and E 0 on the training set. The key to such a regression approach then lies in the design of the dictionary Φ. It must be intricate enough to capture the essential variability of E0 (x) over the molecular states x of interest, while simple enough so that evaluation of Φ (x) is significantly less intensive than a direct quantum mechanical computation (or approximation) of E0 (x) . In this talk we present a novel dictionary Φ for the regression of quantum mechanical energies based on the scattering transform of an intermediate, approximate electron density representation ρx of the state x. The scattering transform has the architecture of a deep convolutional network, composed of an alternating sequence of linear filters and nonlinear maps. Whereas in many deep learning tasks the linear filters are learned from the training data, here
Regression Verification Using Impact Summaries
NASA Technical Reports Server (NTRS)
Backes, John; Person, Suzette J.; Rungta, Neha; Thachuk, Oksana
2013-01-01
Regression verification techniques are used to prove equivalence of syntactically similar programs. Checking equivalence of large programs, however, can be computationally expensive. Existing regression verification techniques rely on abstraction and decomposition techniques to reduce the computational effort of checking equivalence of the entire program. These techniques are sound but not complete. In this work, we propose a novel approach to improve scalability of regression verification by classifying the program behaviors generated during symbolic execution as either impacted or unimpacted. Our technique uses a combination of static analysis and symbolic execution to generate summaries of impacted program behaviors. The impact summaries are then checked for equivalence using an o-the-shelf decision procedure. We prove that our approach is both sound and complete for sequential programs, with respect to the depth bound of symbolic execution. Our evaluation on a set of sequential C artifacts shows that reducing the size of the summaries can help reduce the cost of software equivalence checking. Various reduction, abstraction, and compositional techniques have been developed to help scale software verification techniques to industrial-sized systems. Although such techniques have greatly increased the size and complexity of systems that can be checked, analysis of large software systems remains costly. Regression analysis techniques, e.g., regression testing [16], regression model checking [22], and regression verification [19], restrict the scope of the analysis by leveraging the differences between program versions. These techniques are based on the idea that if code is checked early in development, then subsequent versions can be checked against a prior (checked) version, leveraging the results of the previous analysis to reduce analysis cost of the current version. Regression verification addresses the problem of proving equivalence of closely related program
Simple, Internally Adjustable Valve
NASA Technical Reports Server (NTRS)
Burley, Richard K.
1990-01-01
Valve containing simple in-line, adjustable, flow-control orifice made from ordinary plumbing fitting and two allen setscrews. Construction of valve requires only simple drilling, tapping, and grinding. Orifice installed in existing fitting, avoiding changes in rest of plumbing.
NASA Technical Reports Server (NTRS)
1986-01-01
Corning Glass Works' Serengeti Driver sunglasses are unique in that their lenses self-adjust and filter light while suppressing glare. They eliminate more than 99% of the ultraviolet rays in sunlight. The frames are based on the NASA Anthropometric Source Book.
ERIC Educational Resources Information Center
Abramson, Jane A.
Personal interviews with 100 former farm operators living in Saskatoon, Saskatchewan, were conducted in an attempt to understand the nature of the adjustment process caused by migration from rural to urban surroundings. Requirements for inclusion in the study were that respondents had owned or operated a farm for at least 3 years, had left their…
Hunter, Steven L.
2002-01-01
An inclinometer utilizing synchronous demodulation for high resolution and electronic offset adjustment provides a wide dynamic range without any moving components. A device encompassing a tiltmeter and accompanying electronic circuitry provides quasi-leveled tilt sensors that detect highly resolved tilt change without signal saturation.
2011-01-01
Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG) signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2) values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS) was shown to have high isometric torque estimation accuracy combined with very short training times. PMID:21943179
Risk-adjusted outcome models for public mental health outpatient programs.
Hendryx, M S; Dyck, D G; Srebnik, D
1999-01-01
OBJECTIVE: To develop and test risk-adjustment outcome models in publicly funded mental health outpatient settings. We developed prospective risk models that used demographic and diagnostic variables; client-reported functioning, satisfaction, and quality of life; and case manager clinical ratings to predict subsequent client functional status, health-related quality of life, and satisfaction with services. DATA SOURCES/STUDY SETTING: Data collected from 289 adult clients at five- and ten-month intervals, from six community mental health agencies in Washington state located primarily in suburban and rural areas. Data sources included client self-report, case manager ratings, and management information system data. STUDY DESIGN: Model specifications were tested using prospective linear regression analyses. Models were validated in a separate sample and comparative agency performance examined. PRINCIPAL FINDINGS: Presence of severe diagnoses, substance abuse, client age, and baseline functional status and quality of life were predictive of mental health outcomes. Unadjusted versus risk-adjusted scores resulted in differently ranked agency performance. CONCLUSIONS: Risk-adjusted functional status and patient satisfaction outcome models can be developed for public mental health outpatient programs. Research is needed to improve the predictive accuracy of the outcome models developed in this study, and to develop techniques for use in applied settings. The finding that risk adjustment changes comparative agency performance has important consequences for quality monitoring and improvement. Issues in public mental health risk adjustment are discussed, including static versus dynamic risk models, utilization versus outcome models, choice and timing of measures, and access and quality improvement incentives. PMID:10201857
Drift tube suspension for high intensity linear accelerators
Liska, Donald J.; Schamaun, Roger G.; Clark, Donald C.; Potter, R. Christopher; Frank, Joseph A.
1982-01-01
The disclosure relates to a drift tube suspension for high intensity linear accelerators. The system comprises a series of box-sections girders independently adjustably mounted on a linear accelerator. A plurality of drift tube holding stems are individually adjustably mounted on each girder.
Drift tube suspension for high intensity linear accelerators
Liska, D.J.; Schamaun, R.G.; Clark, D.C.; Potter, R.C.; Frank, J.A.
1980-03-11
The disclosure relates to a drift tube suspension for high intensity linear accelerators. The system comprises a series of box-sections girders independently adjustably mounted on a linear accelerator. A plurality of drift tube holding stems are individually adjustably mounted on each girder.
Weaver, Virginia M; Vargas, Gonzalo García; Silbergeld, Ellen K; Rothenberg, Stephen J; Fadrowski, Jeffrey J; Rubio-Andrade, Marisela; Parsons, Patrick J; Steuerwald, Amy J; Navas-Acien, Ana; Guallar, Eliseo
2014-07-01
Positive associations between urine toxicant levels and measures of glomerular filtration rate (GFR) have been reported recently in a range of populations. The explanation for these associations, in a direction opposite that of traditional nephrotoxicity, is uncertain. Variation in associations by urine concentration adjustment approach has also been observed. Associations of urine cadmium, thallium and uranium in models of serum creatinine- and cystatin-C-based estimated GFR (eGFR) were examined using multiple linear regression in a cross-sectional study of adolescents residing near a lead smelter complex. Urine concentration adjustment approaches compared included urine creatinine, urine osmolality and no adjustment. Median age, blood lead and urine cadmium, thallium and uranium were 13.9 years, 4.0 μg/dL, 0.22, 0.27 and 0.04 g/g creatinine, respectively, in 512 adolescents. Urine cadmium and thallium were positively associated with serum creatinine-based eGFR only when urine creatinine was used to adjust for urine concentration (β coefficient=3.1 mL/min/1.73 m(2); 95% confidence interval=1.4, 4.8 per each doubling of urine cadmium). Weaker positive associations, also only with urine creatinine adjustment, were observed between these metals and serum cystatin-C-based eGFR and between urine uranium and serum creatinine-based eGFR. Additional research using non-creatinine-based methods of adjustment for urine concentration is necessary.
Weaver, Virginia M.; Vargas, Gonzalo García; Silbergeld, Ellen K.; Rothenberg, Stephen J.; Fadrowski, Jeffrey J.; Rubio-Andrade, Marisela; Parsons, Patrick J.; Steuerwald, Amy J.; Navas-Acien, Ana; Guallar, Eliseo
2014-01-01
Positive associations between urine toxicant levels and measures of glomerular filtration rate (GFR) have been reported recently in a range of populations. The explanation for these associations, in a direction opposite that of traditional nephrotoxicity, is uncertain. Variation in associations by urine concentration adjustment approach has also been observed. Associations of urine cadmium, thallium and uranium in models of serum creatinine- and cystatin-C-based estimated GFR (eGFR) were examined using multiple linear regression in a cross-sectional study of adolescents residing near a lead smelter complex. Urine concentration adjustment approaches compared included urine creatinine, urine osmolality and no adjustment. Median age, blood lead and urine cadmium, thallium and uranium were 13.9 years, 4.0 μg/dL, 0.22, 0.27 and 0.04 g/g creatinine, respectively, in 512 adolescents. Urine cadmium and thallium were positively associated with serum creatinine-based eGFR only when urine creatinine was used to adjust for urine concentration (β coefficient=3.1 mL/min/1.73 m2; 95% confidence interval=1.4, 4.8 per each doubling of urine cadmium). Weaker positive associations, also only with urine creatinine adjustment, were observed between these metals and serum cystatin-C-based eGFR and between urine uranium and serum creatinine-based eGFR. Additional research using non-creatinine-based methods of adjustment for urine concentration is necessary. PMID:24815335
Measuring Delay Discounting in Humans Using an Adjusting Amount Task.
Frye, Charles C J; Galizio, Ann; Friedel, Jonathan E; DeHart, W Brady; Odum, Amy L
2016-01-09
Delay discounting refers to a decline in the value of a reward when it is delayed relative to when it is immediately available. Delay discounting tasks are used to identify indifference points, which reflect equal preference for two dichotomous reward alternatives differing in both delay and magnitude. Indifference points are key to assessing the shape of a delay-discounting gradient because they allow us to isolate the effect of delay on value. For example, if at a 1 week delay and a maximum of $1,000, the indifference point is at $700 we know that, for that participant, a 1-week delay corresponds to a 30% reduction in value. This video outlines an adjusting amount delay-discounting task that identifies indifference points relatively quickly and is inexpensive and easy to administer. Once data have been collected, non-linear regression techniques are typically used to generate discounting curves. The steepness of the discounting curve reflects the degree of impulsive choice of a group or individual. These techniques have been used with a wide range of commodities and have identified populations that are relatively impulsive. For example, people with substance abuse problems discount delayed rewards more steeply than control participants. Although degree of discounting varies as a function of the commodity examined, discounting of one commodity correlates with discounting of other commodities, which suggests that discounting may be a persistent pattern of behavior(1).
Optimal color temperature adjustment for mobile devices under varying illuminants
NASA Astrophysics Data System (ADS)
Choi, Kyungah; Suk, Hyeon-Jeong
2014-01-01
With the wide use of mobile devices, display color reproduction has become extremely important. The purpose of this study is to investigate the optimal color temperature for mobile displays under varying illuminants. The effect of the color temperature and the illuminance of ambient lighting on user preferences were observed. For a visual examination, a total of 19 nuanced whites were examined under 20 illuminants. A total of 19 display stimuli with different color temperatures (2,500 K ~ 19,600 K) were presented on an iPad3 (New iPad). The ambient illuminants ranged in color temperature from 2,500 K to 19,800 K and from 0 lx to 3,000 lx in illuminance. Supporting previous studies of color reproduction, there was found to be a positive correlation between the color temperature of illuminants and that of optimal whites. However, the relationship was not linear. Based on assessments by 56 subjects, a regression equation was derived to predict the optimal color temperature adjustment under varying illuminants, as follows: [Display Tcp = 5138.93 log(Illuminant Tcp) - 11956.59, p<.001, R2=0.94]. Moreover, the influence of an illuminant was positively correlated with the illuminance level, confirming the findings of previous studies. It is expected that the findings of this study can be used as the theoretical basis when designing a color strategy for mobile display devices.
3D Regression Heat Map Analysis of Population Study Data.
Klemm, Paul; Lawonn, Kai; Glaßer, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Völzke, Henry; Preim, Bernhard
2016-01-01
Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease. PMID:26529689
3D Regression Heat Map Analysis of Population Study Data.
Klemm, Paul; Lawonn, Kai; Glaßer, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Völzke, Henry; Preim, Bernhard
2016-01-01
Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease.
Nagai, Mika; Konno, Yoshihiro; Satsukawa, Masahiro; Yamashita, Shinji; Yoshinari, Kouichi
2016-08-01
Drug-drug interactions (DDIs) via cytochrome P450 (P450) induction are one clinical problem leading to increased risk of adverse effects and the need for dosage adjustments and additional therapeutic monitoring. In silico models for predicting P450 induction are useful for avoiding DDI risk. In this study, we have established regression models for CYP3A4 and CYP2B6 induction in human hepatocytes using several physicochemical parameters for a set of azole compounds with different P450 induction as characteristics as model compounds. To obtain a well-correlated regression model, the compounds for CYP3A4 or CYP2B6 induction were independently selected from the tested azole compounds using principal component analysis with fold-induction data. Both of the multiple linear regression models obtained for CYP3A4 and CYP2B6 induction are represented by different sets of physicochemical parameters. The adjusted coefficients of determination for these models were of 0.8 and 0.9, respectively. The fold-induction of the validation compounds, another set of 12 azole-containing compounds, were predicted within twofold limits for both CYP3A4 and CYP2B6. The concordance for the prediction of CYP3A4 induction was 87% with another validation set, 23 marketed drugs. However, the prediction of CYP2B6 induction tended to be overestimated for these marketed drugs. The regression models show that lipophilicity mostly contributes to CYP3A4 induction, whereas not only the lipophilicity but also the molecular polarity is important for CYP2B6 induction. Our regression models, especially that for CYP3A4 induction, might provide useful methods to avoid potent CYP3A4 or CYP2B6 inducers during the lead optimization stage without performing induction assays in human hepatocytes.