Sample records for time linear regression

  1. Transmission of linear regression patterns between time series: From relationship in time series to complex networks

    NASA Astrophysics Data System (ADS)

    Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui

    2014-07-01

    The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.

  2. Transmission of linear regression patterns between time series: from relationship in time series to complex networks.

    PubMed

    Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui

    2014-07-01

    The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.

  3. An Expert System for the Evaluation of Cost Models

    DTIC Science & Technology

    1990-09-01

    contrast to the condition of equal error variance, called homoscedasticity. (Reference: Applied Linear Regression Models by John Neter - page 423...normal. (Reference: Applied Linear Regression Models by John Neter - page 125) Click Here to continue -> Autocorrelation Click Here for the index - Index...over time. Error terms correlated over time are said to be autocorrelated or serially correlated. (REFERENCE: Applied Linear Regression Models by John

  4. [Comparison of application of Cochran-Armitage trend test and linear regression analysis for rate trend analysis in epidemiology study].

    PubMed

    Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H

    2017-05-10

    We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P value

  5. CO2 flux determination by closed-chamber methods can be seriously biased by inappropriate application of linear regression

    NASA Astrophysics Data System (ADS)

    Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.

    2007-07-01

    Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.

  6. Estimating monotonic rates from biological data using local linear regression.

    PubMed

    Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R

    2017-03-01

    Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.

  7. Guidelines and Procedures for Computing Time-Series Suspended-Sediment Concentrations and Loads from In-Stream Turbidity-Sensor and Streamflow Data

    USGS Publications Warehouse

    Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.

    2009-01-01

    In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.

  8. CO2 flux determination by closed-chamber methods can be seriously biased by inappropriate application of linear regression

    NASA Astrophysics Data System (ADS)

    Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.

    2007-11-01

    Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.

  9. What Is Wrong with ANOVA and Multiple Regression? Analyzing Sentence Reading Times with Hierarchical Linear Models

    ERIC Educational Resources Information Center

    Richter, Tobias

    2006-01-01

    Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…

  10. Local Linear Regression for Data with AR Errors.

    PubMed

    Li, Runze; Li, Yan

    2009-07-01

    In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.

  11. Post-processing through linear regression

    NASA Astrophysics Data System (ADS)

    van Schaeybroeck, B.; Vannitsem, S.

    2011-03-01

    Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  12. Nonlinear isochrones in murine left ventricular pressure-volume loops: how well does the time-varying elastance concept hold?

    PubMed

    Claessens, T E; Georgakopoulos, D; Afanasyeva, M; Vermeersch, S J; Millar, H D; Stergiopulos, N; Westerhof, N; Verdonck, P R; Segers, P

    2006-04-01

    The linear time-varying elastance theory is frequently used to describe the change in ventricular stiffness during the cardiac cycle. The concept assumes that all isochrones (i.e., curves that connect pressure-volume data occurring at the same time) are linear and have a common volume intercept. Of specific interest is the steepest isochrone, the end-systolic pressure-volume relationship (ESPVR), of which the slope serves as an index for cardiac contractile function. Pressure-volume measurements, achieved with a combined pressure-conductance catheter in the left ventricle of 13 open-chest anesthetized mice, showed a marked curvilinearity of the isochrones. We therefore analyzed the shape of the isochrones by using six regression algorithms (two linear, two quadratic, and two logarithmic, each with a fixed or time-varying intercept) and discussed the consequences for the elastance concept. Our main observations were 1) the volume intercept varies considerably with time; 2) isochrones are equally well described by using quadratic or logarithmic regression; 3) linear regression with a fixed intercept shows poor correlation (R(2) < 0.75) during isovolumic relaxation and early filling; and 4) logarithmic regression is superior in estimating the fixed volume intercept of the ESPVR. In conclusion, the linear time-varying elastance fails to provide a sufficiently robust model to account for changes in pressure and volume during the cardiac cycle in the mouse ventricle. A new framework accounting for the nonlinear shape of the isochrones needs to be developed.

  13. Linear regression analysis of survival data with missing censoring indicators.

    PubMed

    Wang, Qihua; Dinse, Gregg E

    2011-04-01

    Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.

  14. Building a new predictor for multiple linear regression technique-based corrective maintenance turnaround time.

    PubMed

    Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa

    2008-01-01

    This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.

  15. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    EPA Science Inventory

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  16. Wavelet regression model in forecasting crude oil price

    NASA Astrophysics Data System (ADS)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  17. Use of probabilistic weights to enhance linear regression myoelectric control

    NASA Astrophysics Data System (ADS)

    Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.

    2015-12-01

    Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  18. Estimating linear temporal trends from aggregated environmental monitoring data

    USGS Publications Warehouse

    Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.

    2017-01-01

    Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.

  19. RBF kernel based support vector regression to estimate the blood volume and heart rate responses during hemodialysis.

    PubMed

    Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H

    2009-01-01

    This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).

  20. Comparison of two-concentration with multi-concentration linear regressions: Retrospective data analysis of multiple regulated LC-MS bioanalytical projects.

    PubMed

    Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi

    2013-09-01

    Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore, examples are given as how to evaluate the linearity over the entire concentration range when only two concentration levels are used for linear regression. To conclude, two-concentration linear regression is accurate and robust enough for routine use in regulated LC-MS bioanalysis and it significantly saves time and cost as well. Copyright © 2013 Elsevier B.V. All rights reserved.

  1. Time-resolved flow reconstruction with indirect measurements using regression models and Kalman-filtered POD ROM

    NASA Astrophysics Data System (ADS)

    Leroux, Romain; Chatellier, Ludovic; David, Laurent

    2018-01-01

    This article is devoted to the estimation of time-resolved particle image velocimetry (TR-PIV) flow fields using a time-resolved point measurements of a voltage signal obtained by hot-film anemometry. A multiple linear regression model is first defined to map the TR-PIV flow fields onto the voltage signal. Due to the high temporal resolution of the signal acquired by the hot-film sensor, the estimates of the TR-PIV flow fields are obtained with a multiple linear regression method called orthonormalized partial least squares regression (OPLSR). Subsequently, this model is incorporated as the observation equation in an ensemble Kalman filter (EnKF) applied on a proper orthogonal decomposition reduced-order model to stabilize it while reducing the effects of the hot-film sensor noise. This method is assessed for the reconstruction of the flow around a NACA0012 airfoil at a Reynolds number of 1000 and an angle of attack of {20}°. Comparisons with multi-time delay-modified linear stochastic estimation show that both the OPLSR and EnKF combined with OPLSR are more accurate as they produce a much lower relative estimation error, and provide a faithful reconstruction of the time evolution of the velocity flow fields.

  2. Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data

    PubMed Central

    Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.

    2013-01-01

    Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689

  3. Spatio-temporal water quality mapping from satellite images using geographically and temporally weighted regression

    NASA Astrophysics Data System (ADS)

    Chu, Hone-Jay; Kong, Shish-Jeng; Chang, Chih-Hua

    2018-03-01

    The turbidity (TB) of a water body varies with time and space. Water quality is traditionally estimated via linear regression based on satellite images. However, estimating and mapping water quality require a spatio-temporal nonstationary model, while TB mapping necessitates the use of geographically and temporally weighted regression (GTWR) and geographically weighted regression (GWR) models, both of which are more precise than linear regression. Given the temporal nonstationary models for mapping water quality, GTWR offers the best option for estimating regional water quality. Compared with GWR, GTWR provides highly reliable information for water quality mapping, boasts a relatively high goodness of fit, improves the explanation of variance from 44% to 87%, and shows a sufficient space-time explanatory power. The seasonal patterns of TB and the main spatial patterns of TB variability can be identified using the estimated TB maps from GTWR and by conducting an empirical orthogonal function (EOF) analysis.

  4. On the equivalence of case-crossover and time series methods in environmental epidemiology.

    PubMed

    Lu, Yun; Zeger, Scott L

    2007-04-01

    The case-crossover design was introduced in epidemiology 15 years ago as a method for studying the effects of a risk factor on a health event using only cases. The idea is to compare a case's exposure immediately prior to or during the case-defining event with that same person's exposure at otherwise similar "reference" times. An alternative approach to the analysis of daily exposure and case-only data is time series analysis. Here, log-linear regression models express the expected total number of events on each day as a function of the exposure level and potential confounding variables. In time series analyses of air pollution, smooth functions of time and weather are the main confounders. Time series and case-crossover methods are often viewed as competing methods. In this paper, we show that case-crossover using conditional logistic regression is a special case of time series analysis when there is a common exposure such as in air pollution studies. This equivalence provides computational convenience for case-crossover analyses and a better understanding of time series models. Time series log-linear regression accounts for overdispersion of the Poisson variance, while case-crossover analyses typically do not. This equivalence also permits model checking for case-crossover data using standard log-linear model diagnostics.

  5. BFLCRM: A BAYESIAN FUNCTIONAL LINEAR COX REGRESSION MODEL FOR PREDICTING TIME TO CONVERSION TO ALZHEIMER’S DISEASE*

    PubMed Central

    Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G

    2015-01-01

    The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer’s disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM. PMID:26900412

  6. Mental chronometry with simple linear regression.

    PubMed

    Chen, J Y

    1997-10-01

    Typically, mental chronometry is performed by means of introducing an independent variable postulated to affect selectively some stage of a presumed multistage process. However, the effect could be a global one that spreads proportionally over all stages of the process. Currently, there is no method to test this possibility although simple linear regression might serve the purpose. In the present study, the regression approach was tested with tasks (memory scanning and mental rotation) that involved a selective effect and with a task (word superiority effect) that involved a global effect, by the dominant theories. The results indicate (1) the manipulation of the size of a memory set or of angular disparity affects the intercept of the regression function that relates the times for memory scanning with different set sizes or for mental rotation with different angular disparities and (2) the manipulation of context affects the slope of the regression function that relates the times for detecting a target character under word and nonword conditions. These ratify the regression approach as a useful method for doing mental chronometry.

  7. Improving Prediction Accuracy for WSN Data Reduction by Applying Multivariate Spatio-Temporal Correlation

    PubMed Central

    Carvalho, Carlos; Gomes, Danielo G.; Agoulmine, Nazim; de Souza, José Neuman

    2011-01-01

    This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction. PMID:22346626

  8. Does transport time help explain the high trauma mortality rates in rural areas? New and traditional predictors assessed by new and traditional statistical methods

    PubMed Central

    Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas

    2015-01-01

    Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600

  9. Relationship between age and elite marathon race time in world single age records from 5 to 93 years

    PubMed Central

    2014-01-01

    Background The aims of the study were (i) to investigate the relationship between elite marathon race times and age in 1-year intervals by using the world single age records in marathon running from 5 to 93 years and (ii) to evaluate the sex difference in elite marathon running performance with advancing age. Methods World single age records in marathon running in 1-year intervals for women and men were analysed regarding changes across age for both men and women using linear and non-linear regression analyses for each age for women and men. Results The relationship between elite marathon race time and age was non-linear (i.e. polynomial regression 4th degree) for women and men. The curve was U-shaped where performance improved from 5 to ~20 years. From 5 years to ~15 years, boys and girls performed very similar. Between ~20 and ~35 years, performance was quite linear, but started to decrease at the age of ~35 years in a curvilinear manner with increasing age in both women and men. The sex difference increased non-linearly (i.e. polynomial regression 7th degree) from 5 to ~20 years, remained unchanged at ~20 min from ~20 to ~50 years and increased thereafter. The sex difference was lowest (7.5%, 10.5 min) at the age of 49 years. Conclusion Elite marathon race times improved from 5 to ~20 years, remained linear between ~20 and ~35 years, and started to increase at the age of ~35 years in a curvilinear manner with increasing age in both women and men. The sex difference in elite marathon race time increased non-linearly and was lowest at the age of ~49 years. PMID:25120915

  10. Modification of the USLE K factor for soil erodibility assessment on calcareous soils in Iran

    NASA Astrophysics Data System (ADS)

    Ostovari, Yaser; Ghorbani-Dashtaki, Shoja; Bahrami, Hossein-Ali; Naderi, Mehdi; Dematte, Jose Alexandre M.; Kerry, Ruth

    2016-11-01

    The measurement of soil erodibility (K) in the field is tedious, time-consuming and expensive; therefore, its prediction through pedotransfer functions (PTFs) could be far less costly and time-consuming. The aim of this study was to develop new PTFs to estimate the K factor using multiple linear regression, Mamdani fuzzy inference systems, and artificial neural networks. For this purpose, K was measured in 40 erosion plots with natural rainfall. Various soil properties including the soil particle size distribution, calcium carbonate equivalent, organic matter, permeability, and wet-aggregate stability were measured. The results showed that the mean measured K was 0.014 t h MJ- 1 mm- 1 and 2.08 times less than the estimated mean K (0.030 t h MJ- 1 mm- 1) using the USLE model. Permeability, wet-aggregate stability, very fine sand, and calcium carbonate were selected as independent variables by forward stepwise regression in order to assess the ability of multiple linear regression, Mamdani fuzzy inference systems and artificial neural networks to predict K. The calcium carbonate equivalent, which is not accounted for in the USLE model, had a significant impact on K in multiple linear regression due to its strong influence on the stability of aggregates and soil permeability. Statistical indices in validation and calibration datasets determined that the artificial neural networks method with the highest R2, lowest RMSE, and lowest ME was the best model for estimating the K factor. A strong correlation (R2 = 0.81, n = 40, p < 0.05) between the estimated K from multiple linear regression and measured K indicates that the use of calcium carbonate equivalent as a predictor variable gives a better estimation of K in areas with calcareous soils.

  11. The Association of Sitting Time With Sarcopenia Status and Physical Performance at Baseline and 18-Month Follow-Up in the Residential Aged Care Setting.

    PubMed

    Reid, Natasha; Keogh, Justin W; Swinton, Paul; Gardiner, Paul A; Henwood, Timothy R

    2018-06-18

    This study investigated the association of sitting time with sarcopenia and physical performance in residential aged care residents at baseline and 18-month follow-up. Measures included the International Physical Activity Questionnaire (sitting time), European Working Group definition of sarcopenia, and the short physical performance battery (physical performance). Logistic regression and linear regression analyses were used to investigate associations. For each hour of sitting, the unadjusted odds ratio of sarcopenia was 1.16 (95% confidence interval [0.98, 1.37]). Linear regression showed that each hour of sitting was significantly associated with a 0.2-unit lower score for performance. Associations of baseline sitting with follow-up sarcopenia status and performance were nonsignificant. Cross-sectionally, increased sitting time in residential aged care may be detrimentally associated with sarcopenia and physical performance. Based on current reablement models of care, future studies should investigate if reducing sedentary time improves performance among adults in end of life care.

  12. Regression analysis using dependent Polya trees.

    PubMed

    Schörgendorfer, Angela; Branscum, Adam J

    2013-11-30

    Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.

  13. Analysis and prediction of flow from local source in a river basin using a Neuro-fuzzy modeling tool.

    PubMed

    Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi

    2007-10-01

    Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.

  14. Tolerance of ciliated protozoan Paramecium bursaria (Protozoa, Ciliophora) to ammonia and nitrites

    NASA Astrophysics Data System (ADS)

    Xu, Henglong; Song, Weibo; Lu, Lu; Alan, Warren

    2005-09-01

    The tolerance to ammonia and nitrites in freshwater ciliate Paramecium bursaria was measured in a conventional open system. The ciliate was exposed to different concentrations of ammonia and nitrites for 2h and 12h in order to determine the lethal concentrations. Linear regression analysis revealed that the 2h-LC50 value for ammonia was 95.94 mg/L and for nitrite 27.35 mg/L using probit scale method (with 95% confidence intervals). There was a linear correlation between the mortality probit scale and logarithmic concentration of ammonia which fit by a regression equation y=7.32 x 9.51 ( R 2=0.98; y, mortality probit scale; x, logarithmic concentration of ammonia), by which 2 h-LC50 value for ammonia was found to be 95.50 mg/L. A linear correlation between mortality probit scales and logarithmic concentration of nitrite is also followed the regression equation y=2.86 x+0.89 ( R 2=0.95; y, mortality probit scale; x, logarithmic concentration of nitrite). The regression analysis of toxicity curves showed that the linear correlation between exposed time of ammonia-N LC50 value and ammonia-N LC50 value followed the regression equation y=2 862.85 e -0.08 x ( R 2=0.95; y, duration of exposure to LC50 value; x, LC50 value), and that between exposed time of nitrite-N LC50 value and nitrite-N LC50 value followed the regression equation y=127.15 e -0.13 x ( R 2=0.91; y, exposed time of LC50 value; x, LC50 value). The results demonstrate that the tolerance to ammonia in P. bursaria is considerably higher than that of the larvae or juveniles of some metozoa, e.g. cultured prawns and oysters. In addition, ciliates, as bacterial predators, are likely to play a positive role in maintaining and improving water quality in aquatic environments with high-level ammonium, such as sewage treatment systems.

  15. Advanced statistics: linear regression, part I: simple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  16. Generating linear regression model to predict motor functions by use of laser range finder during TUG.

    PubMed

    Adachi, Daiki; Nishiguchi, Shu; Fukutani, Naoto; Hotta, Takayuki; Tashiro, Yuto; Morino, Saori; Shirooka, Hidehiko; Nozaki, Yuma; Hirata, Hinako; Yamaguchi, Moe; Yorozu, Ayanori; Takahashi, Masaki; Aoyama, Tomoki

    2017-05-01

    The purpose of this study was to investigate which spatial and temporal parameters of the Timed Up and Go (TUG) test are associated with motor function in elderly individuals. This study included 99 community-dwelling women aged 72.9 ± 6.3 years. Step length, step width, single support time, variability of the aforementioned parameters, gait velocity, cadence, reaction time from starting signal to first step, and minimum distance between the foot and a marker placed to 3 in front of the chair were measured using our analysis system. The 10-m walk test, five times sit-to-stand (FTSTS) test, and one-leg standing (OLS) test were used to assess motor function. Stepwise multivariate linear regression analysis was used to determine which TUG test parameters were associated with each motor function test. Finally, we calculated a predictive model for each motor function test using each regression coefficient. In stepwise linear regression analysis, step length and cadence were significantly associated with the 10-m walk test, FTSTS and OLS test. Reaction time was associated with the FTSTS test, and step width was associated with the OLS test. Each predictive model showed a strong correlation with the 10-m walk test and OLS test (P < 0.01), which was not significant higher correlation than TUG test time. We showed which TUG test parameters were associated with each motor function test. Moreover, the TUG test time regarded as the lower extremity function and mobility has strong predictive ability in each motor function test. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.

  17. Ergonomics study on mobile phones for thumb physiology discomfort

    NASA Astrophysics Data System (ADS)

    Bendero, J. M. S.; Doon, M. E. R.; Quiogue, K. C. A.; Soneja, L. C.; Ong, N. R.; Sauli, Z.; Vairavan, R.

    2017-09-01

    The study was conducted on Filipino undergraduate college students and aimed to find out about the significant factors associated with mobile phone usage and its effect on thumb pain.A correlation-prediction analysisand Multiple Linear Regression was adopted and used as the main tool in determining the significant factors and coming up with predictive models on thumb related pain. With the use of the software Statistical Package for the Social Sciences or SPSS in conducting linear regression, 2 significant factors on thumb-related pain (percentage of time using portrait as screen orientation when text messaging, amount of time playing games using one hand in a day) were found.

  18. Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States

    NASA Astrophysics Data System (ADS)

    Yang, J.; Astitha, M.; Schwartz, C. S.

    2017-12-01

    Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.

  19. New method for calculating a mathematical expression for streamflow recession

    USGS Publications Warehouse

    Rutledge, Albert T.

    1991-01-01

    An empirical method has been devised to calculate the master recession curve, which is a mathematical expression for streamflow recession during times of negligible direct runoff. The method is based on the assumption that the storage-delay factor, which is the time per log cycle of streamflow recession, varies linearly with the logarithm of streamflow. The resulting master recession curve can be nonlinear. The method can be executed by a computer program that reads a data file of daily mean streamflow, then allows the user to select several near-linear segments of streamflow recession. The storage-delay factor for each segment is one of the coefficients of the equation that results from linear least-squares regression. Using results for each recession segment, a mathematical expression of the storage-delay factor as a function of the log of streamflow is determined by linear least-squares regression. The master recession curve, which is a second-order polynomial expression for time as a function of log of streamflow, is then derived using the coefficients of this function.

  20. Automating approximate Bayesian computation by local linear regression.

    PubMed

    Thornton, Kevin R

    2009-07-07

    In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.

  1. Forecasting daily patient volumes in the emergency department.

    PubMed

    Jones, Spencer S; Thomas, Alun; Evans, R Scott; Welch, Shari J; Haug, Peter J; Snow, Gregory L

    2008-02-01

    Shifts in the supply of and demand for emergency department (ED) resources make the efficient allocation of ED resources increasingly important. Forecasting is a vital activity that guides decision-making in many areas of economic, industrial, and scientific planning, but has gained little traction in the health care industry. There are few studies that explore the use of forecasting methods to predict patient volumes in the ED. The goals of this study are to explore and evaluate the use of several statistical forecasting methods to predict daily ED patient volumes at three diverse hospital EDs and to compare the accuracy of these methods to the accuracy of a previously proposed forecasting method. Daily patient arrivals at three hospital EDs were collected for the period January 1, 2005, through March 31, 2007. The authors evaluated the use of seasonal autoregressive integrated moving average, time series regression, exponential smoothing, and artificial neural network models to forecast daily patient volumes at each facility. Forecasts were made for horizons ranging from 1 to 30 days in advance. The forecast accuracy achieved by the various forecasting methods was compared to the forecast accuracy achieved when using a benchmark forecasting method already available in the emergency medicine literature. All time series methods considered in this analysis provided improved in-sample model goodness of fit. However, post-sample analysis revealed that time series regression models that augment linear regression models by accounting for serial autocorrelation offered only small improvements in terms of post-sample forecast accuracy, relative to multiple linear regression models, while seasonal autoregressive integrated moving average, exponential smoothing, and artificial neural network forecasting models did not provide consistently accurate forecasts of daily ED volumes. This study confirms the widely held belief that daily demand for ED services is characterized by seasonal and weekly patterns. The authors compared several time series forecasting methods to a benchmark multiple linear regression model. The results suggest that the existing methodology proposed in the literature, multiple linear regression based on calendar variables, is a reasonable approach to forecasting daily patient volumes in the ED. However, the authors conclude that regression-based models that incorporate calendar variables, account for site-specific special-day effects, and allow for residual autocorrelation provide a more appropriate, informative, and consistently accurate approach to forecasting daily ED patient volumes.

  2. Time-Frequency Analysis of Non-Stationary Biological Signals with Sparse Linear Regression Based Fourier Linear Combiner.

    PubMed

    Wang, Yubo; Veluvolu, Kalyana C

    2017-06-14

    It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.

  3. Comparison of Sub-Pixel Classification Approaches for Crop-Specific Mapping

    EPA Science Inventory

    This paper examined two non-linear models, Multilayer Perceptron (MLP) regression and Regression Tree (RT), for estimating sub-pixel crop proportions using time-series MODIS-NDVI data. The sub-pixel proportions were estimated for three major crop types including corn, soybean, a...

  4. Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

    PubMed

    Ng, Kar Yong; Awang, Norhashidah

    2018-01-06

    Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.

  5. Time Series Analysis and Forecasting of Wastewater Inflow into Bandar Tun Razak Sewage Treatment Plant in Selangor, Malaysia

    NASA Astrophysics Data System (ADS)

    Abunama, Taher; Othman, Faridah

    2017-06-01

    Analysing the fluctuations of wastewater inflow rates in sewage treatment plants (STPs) is essential to guarantee a sufficient treatment of wastewater before discharging it to the environment. The main objectives of this study are to statistically analyze and forecast the wastewater inflow rates into the Bandar Tun Razak STP in Kuala Lumpur, Malaysia. A time series analysis of three years’ weekly influent data (156weeks) has been conducted using the Auto-Regressive Integrated Moving Average (ARIMA) model. Various combinations of ARIMA orders (p, d, q) have been tried to select the most fitted model, which was utilized to forecast the wastewater inflow rates. The linear regression analysis was applied to testify the correlation between the observed and predicted influents. ARIMA (3, 1, 3) model was selected with the highest significance R-square and lowest normalized Bayesian Information Criterion (BIC) value, and accordingly the wastewater inflow rates were forecasted to additional 52weeks. The linear regression analysis between the observed and predicted values of the wastewater inflow rates showed a positive linear correlation with a coefficient of 0.831.

  6. Linear regression metamodeling as a tool to summarize and present simulation model results.

    PubMed

    Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M

    2013-10-01

    Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.

  7. Effect of removing the common mode errors on linear regression analysis of noise amplitudes in position time series of a regional GPS network & a case study of GPS stations in Southern California

    NASA Astrophysics Data System (ADS)

    Jiang, Weiping; Ma, Jun; Li, Zhao; Zhou, Xiaohui; Zhou, Boye

    2018-05-01

    The analysis of the correlations between the noise in different components of GPS stations has positive significance to those trying to obtain more accurate uncertainty of velocity with respect to station motion. Previous research into noise in GPS position time series focused mainly on single component evaluation, which affects the acquisition of precise station positions, the velocity field, and its uncertainty. In this study, before and after removing the common-mode error (CME), we performed one-dimensional linear regression analysis of the noise amplitude vectors in different components of 126 GPS stations with a combination of white noise, flicker noise, and random walking noise in Southern California. The results show that, on the one hand, there are above-moderate degrees of correlation between the white noise amplitude vectors in all components of the stations before and after removal of the CME, while the correlations between flicker noise amplitude vectors in horizontal and vertical components are enhanced from un-correlated to moderately correlated by removing the CME. On the other hand, the significance tests show that, all of the obtained linear regression equations, which represent a unique function of the noise amplitude in any two components, are of practical value after removing the CME. According to the noise amplitude estimates in two components and the linear regression equations, more accurate noise amplitudes can be acquired in the two components.

  8. Detecting a Change in School Performance: A Bayesian Analysis for a Multilevel Join Point Problem. CSE Technical Report 542.

    ERIC Educational Resources Information Center

    Thum, Yeow Meng; Bhattacharya, Suman Kumar

    To better describe individual behavior within a system, this paper uses a sample of longitudinal test scores from a large urban school system to consider hierarchical Bayes estimation of a multilevel linear regression model in which each individual regression slope of test score on time switches at some unknown point in time, "kj."…

  9. Correlation and simple linear regression.

    PubMed

    Eberly, Lynn E

    2007-01-01

    This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.

  10. Optimizing methods for linking cinematic features to fMRI data.

    PubMed

    Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

    2015-04-15

    One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.

  11. Linear and evolutionary polynomial regression models to forecast coastal dynamics: Comparison and reliability assessment

    NASA Astrophysics Data System (ADS)

    Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe

    2018-01-01

    In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.

  12. GWAS with longitudinal phenotypes: performance of approximate procedures

    PubMed Central

    Sikorska, Karolina; Montazeri, Nahid Mostafavi; Uitterlinden, André; Rivadeneira, Fernando; Eilers, Paul HC; Lesaffre, Emmanuel

    2015-01-01

    Analysis of genome-wide association studies with longitudinal data using standard procedures, such as linear mixed model (LMM) fitting, leads to discouragingly long computation times. There is a need to speed up the computations significantly. In our previous work (Sikorska et al: Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat Med 2012; 32.1: 165–180), we proposed the conditional two-step (CTS) approach as a fast method providing an approximation to the P-value for the longitudinal single-nucleotide polymorphism (SNP) effect. In the first step a reduced conditional LMM is fit, omitting all the SNP terms. In the second step, the estimated random slopes are regressed on SNPs. The CTS has been applied to the bone mineral density data from the Rotterdam Study and proved to work very well even in unbalanced situations. In another article (Sikorska et al: GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics 2013; 14: 166), we suggested semi-parallel computations, greatly speeding up fitting many linear regressions. Combining CTS with fast linear regression reduces the computation time from several weeks to a few minutes on a single computer. Here, we explore further the properties of the CTS both analytically and by simulations. We investigate the performance of our proposal in comparison with a related but different approach, the two-step procedure. It is analytically shown that for the balanced case, under mild assumptions, the P-value provided by the CTS is the same as from the LMM. For unbalanced data and in realistic situations, simulations show that the CTS method does not inflate the type I error rate and implies only a minimal loss of power. PMID:25712081

  13. FPGA implementation of predictive degradation model for engine oil lifetime

    NASA Astrophysics Data System (ADS)

    Idros, M. F. M.; Razak, A. H. A.; Junid, S. A. M. Al; Suliman, S. I.; Halim, A. K.

    2018-03-01

    This paper presents the implementation of linear regression model for degradation prediction on Register Transfer Logic (RTL) using QuartusII. A stationary model had been identified in the degradation trend for the engine oil in a vehicle in time series method. As for RTL implementation, the degradation model is written in Verilog HDL and the data input are taken at a certain time. Clock divider had been designed to support the timing sequence of input data. At every five data, a regression analysis is adapted for slope variation determination and prediction calculation. Here, only the negative value are taken as the consideration for the prediction purposes for less number of logic gate. Least Square Method is adapted to get the best linear model based on the mean values of time series data. The coded algorithm has been implemented on FPGA for validation purposes. The result shows the prediction time to change the engine oil.

  14. Distance correction system for localization based on linear regression and smoothing in ambient intelligence display.

    PubMed

    Kim, Dae-Hee; Choi, Jae-Hun; Lim, Myung-Eun; Park, Soo-Jun

    2008-01-01

    This paper suggests the method of correcting distance between an ambient intelligence display and a user based on linear regression and smoothing method, by which distance information of a user who approaches to the display can he accurately output even in an unanticipated condition using a passive infrared VIR) sensor and an ultrasonic device. The developed system consists of an ambient intelligence display and an ultrasonic transmitter, and a sensor gateway. Each module communicates with each other through RF (Radio frequency) communication. The ambient intelligence display includes an ultrasonic receiver and a PIR sensor for motion detection. In particular, this system selects and processes algorithms such as smoothing or linear regression for current input data processing dynamically through judgment process that is determined using the previous reliable data stored in a queue. In addition, we implemented GUI software with JAVA for real time location tracking and an ambient intelligence display.

  15. Analysis of reciprocal creatinine plots by two-phase linear regression.

    PubMed

    Rowe, P A; Richardson, R E; Burton, P R; Morgan, A G; Burden, R P

    1989-01-01

    The progression of renal diseases is often monitored by the serial measurement of plasma creatinine. The slope of the linear relation that is frequently found between the reciprocal of creatinine concentration and time delineates the rate of change in renal function. Minor changes in slope, perhaps indicating response to therapeutic intervention, can be difficult to identify and yet be of clinical importance. We describe the application of two-phase linear regression to identify and characterise changes in slope using a microcomputer. The method fits two intersecting lines to the data by computing a least-squares estimate of the position of the slope change and its 95% confidence limits. This avoids the potential bias of fixing the change at a preconceived time corresponding with an alteration in treatment. The program then evaluates the statistical and clinical significance of the slope change and produces a graphical output to aid interpretation.

  16. Analysis and generation of groundwater concentration time series

    NASA Astrophysics Data System (ADS)

    Crăciun, Maria; Vamoş, Călin; Suciu, Nicolae

    2018-01-01

    Concentration time series are provided by simulated concentrations of a nonreactive solute transported in groundwater, integrated over the transverse direction of a two-dimensional computational domain and recorded at the plume center of mass. The analysis of a statistical ensemble of time series reveals subtle features that are not captured by the first two moments which characterize the approximate Gaussian distribution of the two-dimensional concentration fields. The concentration time series exhibit a complex preasymptotic behavior driven by a nonstationary trend and correlated fluctuations with time-variable amplitude. Time series with almost the same statistics are generated by successively adding to a time-dependent trend a sum of linear regression terms, accounting for correlations between fluctuations around the trend and their increments in time, and terms of an amplitude modulated autoregressive noise of order one with time-varying parameter. The algorithm generalizes mixing models used in probability density function approaches. The well-known interaction by exchange with the mean mixing model is a special case consisting of a linear regression with constant coefficients.

  17. Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

    PubMed Central

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-01-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882

  18. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    PubMed

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  19. Taxi-Out Time Prediction for Departures at Charlotte Airport Using Machine Learning Techniques

    NASA Technical Reports Server (NTRS)

    Lee, Hanbong; Malik, Waqar; Jung, Yoon C.

    2016-01-01

    Predicting the taxi-out times of departures accurately is important for improving airport efficiency and takeoff time predictability. In this paper, we attempt to apply machine learning techniques to actual traffic data at Charlotte Douglas International Airport for taxi-out time prediction. To find the key factors affecting aircraft taxi times, surface surveillance data is first analyzed. From this data analysis, several variables, including terminal concourse, spot, runway, departure fix and weight class, are selected for taxi time prediction. Then, various machine learning methods such as linear regression, support vector machines, k-nearest neighbors, random forest, and neural networks model are applied to actual flight data. Different traffic flow and weather conditions at Charlotte airport are also taken into account for more accurate prediction. The taxi-out time prediction results show that linear regression and random forest techniques can provide the most accurate prediction in terms of root-mean-square errors. We also discuss the operational complexity and uncertainties that make it difficult to predict the taxi times accurately.

  20. Non-Linear Approach in Kinesiology Should Be Preferred to the Linear--A Case of Basketball.

    PubMed

    Trninić, Marko; Jeličić, Mario; Papić, Vladan

    2015-07-01

    In kinesiology, medicine, biology and psychology, in which research focus is on dynamical self-organized systems, complex connections exist between variables. Non-linear nature of complex systems has been discussed and explained by the example of non-linear anthropometric predictors of performance in basketball. Previous studies interpreted relations between anthropometric features and measures of effectiveness in basketball by (a) using linear correlation models, and by (b) including all basketball athletes in the same sample of participants regardless of their playing position. In this paper the significance and character of linear and non-linear relations between simple anthropometric predictors (AP) and performance criteria consisting of situation-related measures of effectiveness (SE) in basketball were determined and evaluated. The sample of participants consisted of top-level junior basketball players divided in three groups according to their playing time (8 minutes and more per game) and playing position: guards (N = 42), forwards (N = 26) and centers (N = 40). Linear (general model) and non-linear (general model) regression models were calculated simultaneously and separately for each group. The conclusion is viable: non-linear regressions are frequently superior to linear correlations when interpreting actual association logic among research variables.

  1. Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.

    PubMed

    Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W

    1992-03-01

    The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.

  2. Cocaine Dependence Treatment Data: Methods for Measurement Error Problems With Predictors Derived From Stationary Stochastic Processes

    PubMed Central

    Guan, Yongtao; Li, Yehua; Sinha, Rajita

    2011-01-01

    In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854

  3. Area under the curve predictions of dalbavancin, a new lipoglycopeptide agent, using the end of intravenous infusion concentration data point by regression analyses such as linear, log-linear and power models.

    PubMed

    Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally

    2018-02-01

    1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE < 10.3%. The external data evaluation showed that the models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.

  4. Applicational possibilities of linear and non-linear (polynomial) regression and analysis of variance. III. Stability determination of pharmaceutical preparations: stability of diclofenac-sodium in Diclofen injections.

    PubMed

    Arambasić, M B; Jatić-Slavković, D

    2004-05-01

    This paper presents the application of the regression analysis program and the program for comparing linear regressions (modified method for one-way, analysis of variance), writtens in BASIC program language, for instance, determination of content of Diclofenac-Sodium (active ingredient in DIKLOFEN injections, ampules á 75 mg/3 ml). Stability testing of Diclofenac-Sodium was done by isothermic method of accelerated aging at 4 different temperatures (30 degrees, 40 degrees, 50 degrees and 60 degrees C) as a function of time (4 different duration of treatment: (0-155, 0-145, 0-74 and 0-44 days). The decrease in stability (decrease in the mean value of the content of Diclofenac-Sodium (in %), at different temperatures as a function of time, is possible to describe by, linear dependance. According to the value for regression equation values, the times are assessed in which the content of Diclofenac-Sodium (in %) will decrease by 10%, of the initial value. The times are follows at 30 degrees C 761.02 days, at 40 degrees C 397.26 days, at 50 degrees C 201.96 days and at 60 degrees C 58.85 days. The estimated times (in days) in which the mean value for Diclofenac-Sodium content (in %) will by 10% of the initial values, as a junction of time, are most suitably described by 3rd order parabola. Based on the parameter values which describe the 3rd order parabola, the time was estimated in which Diclofenac-Sodium content mean value (in %) will fall by 10% of the initial one at average ambient temperatures of 20 degrees C and 25 degrees C. The times are: 1409.47 days (20 degrees C) and 1042.39 days (25 degrees C). Based on the value for Fischer's coefficien (F), the comparison of trenf of Diclofenac-Sodium content (in %) shows that, under the influence of different temperatures as a function of time, among them, depending on temperature value, there is: statistically very significant difference (P < .05) at 50 degrees C and lower toward 60 degrees C, i.e. statistically probably significant difference (P > 0.01) at 40 degrees C and lower towards 50 degrees C and there is no statistically significance difference (P > 0.05) at 30 degrees C towards 40 degrees C.

  5. A comparative study between nonlinear regression and artificial neural network approaches for modelling wild oat (Avena fatua) field emergence

    USDA-ARS?s Scientific Manuscript database

    Non-linear regression techniques are used widely to fit weed field emergence patterns to soil microclimatic indices using S-type functions. Artificial neural networks present interesting and alternative features for such modeling purposes. In this work, a univariate hydrothermal-time based Weibull m...

  6. Comparative study of Poincaré plot analysis using short electroencephalogram signals during anaesthesia with spectral edge frequency 95 and bispectral index.

    PubMed

    Hayashi, K; Yamada, T; Sawa, T

    2015-03-01

    The return or Poincaré plot is a non-linear analytical approach in a two-dimensional plane, where a timed signal is plotted against itself after a time delay. Its scatter pattern reflects the randomness and variability in the signals. Quantification of a Poincaré plot of the electroencephalogram has potential to determine anaesthesia depth. We quantified the degree of dispersion (i.e. standard deviation, SD) along the diagonal line of the electroencephalogram-Poincaré plot (named as SD1/SD2), and compared SD1/SD2 values with spectral edge frequency 95 (SEF95) and bispectral index values. The regression analysis showed a tight linear regression equation with a coefficient of determination (R(2) ) value of 0.904 (p < 0.0001) between the Poincaré index (SD1/SD2) and SEF95, and a moderate linear regression equation between SD1/SD2 and bispectral index (R(2)  = 0.346, p < 0.0001). Quantification of the Poincaré plot tightly correlates with SEF95, reflecting anaesthesia-dependent changes in electroencephalogram oscillation. © 2014 The Association of Anaesthetists of Great Britain and Ireland.

  7. Cooperation without culture? The null effect of generalized trust on intentional homicide: a cross-national panel analysis, 1995-2009.

    PubMed

    Robbins, Blaine

    2013-01-01

    Sociologists, political scientists, and economists all suggest that culture plays a pivotal role in the development of large-scale cooperation. In this study, I used generalized trust as a measure of culture to explore if and how culture impacts intentional homicide, my operationalization of cooperation. I compiled multiple cross-national data sets and used pooled time-series linear regression, single-equation instrumental-variables linear regression, and fixed- and random-effects estimation techniques on an unbalanced panel of 118 countries and 232 observations spread over a 15-year time period. Results suggest that culture and large-scale cooperation form a tenuous relationship, while economic factors such as development, inequality, and geopolitics appear to drive large-scale cooperation.

  8. Objectively measured sedentary time and academic achievement in schoolchildren.

    PubMed

    Lopes, Luís; Santos, Rute; Mota, Jorge; Pereira, Beatriz; Lopes, Vítor

    2017-03-01

    This study aimed to evaluate the relationship between objectively measured total sedentary time and academic achievement (AA) in Portuguese children. The sample comprised of 213 children (51.6% girls) aged 9.46 ± 0.43 years, from the north of Portugal. Sedentary time was measured with accelerometry, and AA was assessed using the Portuguese Language and Mathematics National Exams results. Multilevel linear regression models were fitted to assess regression coefficients predicting AA. The results showed that objectively measured total sedentary time was not associated with AA, after adjusting for potential confounders.

  9. Pseudo-second order models for the adsorption of safranin onto activated carbon: comparison of linear and non-linear regression methods.

    PubMed

    Kumar, K Vasanth

    2007-04-02

    Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.

  10. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    NASA Astrophysics Data System (ADS)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  11. Weather Impact on Airport Arrival Meter Fix Throughput

    NASA Technical Reports Server (NTRS)

    Wang, Yao

    2017-01-01

    Time-based flow management provides arrival aircraft schedules based on arrival airport conditions, airport capacity, required spacing, and weather conditions. In order to meet a scheduled time at which arrival aircraft can cross an airport arrival meter fix prior to entering the airport terminal airspace, air traffic controllers make regulations on air traffic. Severe weather may create an airport arrival bottleneck if one or more of airport arrival meter fixes are partially or completely blocked by the weather and the arrival demand has not been reduced accordingly. Under these conditions, aircraft are frequently being put in holding patterns until they can be rerouted. A model that predicts the weather impacted meter fix throughput may help air traffic controllers direct arrival flows into the airport more efficiently, minimizing arrival meter fix congestion. This paper presents an analysis of air traffic flows across arrival meter fixes at the Newark Liberty International Airport (EWR). Several scenarios of weather impacted EWR arrival fix flows are described. Furthermore, multiple linear regression and regression tree ensemble learning approaches for translating multiple sector Weather Impacted Traffic Indexes (WITI) to EWR arrival meter fix throughputs are examined. These weather translation models are developed and validated using the EWR arrival flight and weather data for the period of April-September in 2014. This study also compares the performance of the regression tree ensemble with traditional multiple linear regression models for estimating the weather impacted throughputs at each of the EWR arrival meter fixes. For all meter fixes investigated, the results from the regression tree ensemble weather translation models show a stronger correlation between model outputs and observed meter fix throughputs than that produced from multiple linear regression method.

  12. Horses Auto-Recruit Their Lungs by Inspiratory Breath Holding Following Recovery from General Anaesthesia

    PubMed Central

    Mosing, Martina; Waldmann, Andreas D.; MacFarlane, Paul; Iff, Samuel; Auer, Ulrike; Bohm, Stephan H.; Bettschart-Wolfensberger, Regula; Bardell, David

    2016-01-01

    This study evaluated the breathing pattern and distribution of ventilation in horses prior to and following recovery from general anaesthesia using electrical impedance tomography (EIT). Six horses were anaesthetised for 6 hours in dorsal recumbency. Arterial blood gas and EIT measurements were performed 24 hours before (baseline) and 1, 2, 3, 4, 5 and 6 hours after horses stood following anaesthesia. At each time point 4 representative spontaneous breaths were analysed. The percentage of the total breath length during which impedance remained greater than 50% of the maximum inspiratory impedance change (breath holding), the fraction of total tidal ventilation within each of four stacked regions of interest (ROI) (distribution of ventilation) and the filling time and inflation period of seven ROI evenly distributed over the dorso-ventral height of the lungs were calculated. Mixed effects multi-linear regression and linear regression were used and significance was set at p<0.05. All horses demonstrated inspiratory breath holding until 5 hours after standing. No change from baseline was seen for the distribution of ventilation during inspiration. Filling time and inflation period were more rapid and shorter in ventral and slower and longer in most dorsal ROI compared to baseline, respectively. In a mixed effects multi-linear regression, breath holding was significantly correlated with PaCO2 in both the univariate and multivariate regression. Following recovery from anaesthesia, horses showed inspiratory breath holding during which gas redistributed from ventral into dorsal regions of the lungs. This suggests auto-recruitment of lung tissue which would have been dependent and likely atelectic during anaesthesia. PMID:27331910

  13. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    NASA Astrophysics Data System (ADS)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  14. A simple linear regression method for quantitative trait loci linkage analysis with censored observations.

    PubMed

    Anderson, Carl A; McRae, Allan F; Visscher, Peter M

    2006-07-01

    Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.

  15. Linear regression crash prediction models : issues and proposed solutions.

    DOT National Transportation Integrated Search

    2010-05-01

    The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...

  16. Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment

    ERIC Educational Resources Information Center

    Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos

    2013-01-01

    In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…

  17. Profile local linear estimation of generalized semiparametric regression model for longitudinal data.

    PubMed

    Sun, Yanqing; Sun, Liuquan; Zhou, Jie

    2013-07-01

    This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.

  18. Weighted linear regression using D2H and D2 as the independent variables

    Treesearch

    Hans T. Schreuder; Michael S. Williams

    1998-01-01

    Several error structures for weighted regression equations used for predicting volume were examined for 2 large data sets of felled and standing loblolly pine trees (Pinus taeda L.). The generally accepted model with variance of error proportional to the value of the covariate squared ( D2H = diameter squared times height or D...

  19. Equilibrium, kinetics and process design of acid yellow 132 adsorption onto red pine sawdust.

    PubMed

    Can, Mustafa

    2015-01-01

    Linear and non-linear regression procedures have been applied to the Langmuir, Freundlich, Tempkin, Dubinin-Radushkevich, and Redlich-Peterson isotherms for adsorption of acid yellow 132 (AY132) dye onto red pine (Pinus resinosa) sawdust. The effects of parameters such as particle size, stirring rate, contact time, dye concentration, adsorption dose, pH, and temperature were investigated, and interaction was characterized by Fourier transform infrared spectroscopy and field emission scanning electron microscope. The non-linear method of the Langmuir isotherm equation was found to be the best fitting model to the equilibrium data. The maximum monolayer adsorption capacity was found as 79.5 mg/g. The calculated thermodynamic results suggested that AY132 adsorption onto red pine sawdust was an exothermic, physisorption, and spontaneous process. Kinetics was analyzed by four different kinetic equations using non-linear regression analysis. The pseudo-second-order equation provides the best fit with experimental data.

  20. Neural network and multiple linear regression to predict school children dimensions for ergonomic school furniture design.

    PubMed

    Agha, Salah R; Alnahhal, Mohammed J

    2012-11-01

    The current study investigates the possibility of obtaining the anthropometric dimensions, critical to school furniture design, without measuring all of them. The study first selects some anthropometric dimensions that are easy to measure. Two methods are then used to check if these easy-to-measure dimensions can predict the dimensions critical to the furniture design. These methods are multiple linear regression and neural networks. Each dimension that is deemed necessary to ergonomically design school furniture is expressed as a function of some other measured anthropometric dimensions. Results show that out of the five dimensions needed for chair design, four can be related to other dimensions that can be measured while children are standing. Therefore, the method suggested here would definitely save time and effort and avoid the difficulty of dealing with students while measuring these dimensions. In general, it was found that neural networks perform better than multiple linear regression in the current study. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.

  1. The Effect of Information Level on Human-Agent Interaction for Route Planning

    DTIC Science & Technology

    2015-12-01

    13 Fig. 4 Experiment 1 shows regression results for time spent at DP predicting posttest trust group membership for the high LOI...decision time by pretest trust group membership. Bars denote standard error (SE). DT at DP was evaluated to see if it predicted posttest trust... group . Linear regression indicated that DT at DP was not a significant predictor of posttest trust for the Low or the Medium LOI conditions; however, it

  2. A hierarchical linear model for tree height prediction.

    Treesearch

    Vicente J. Monleon

    2003-01-01

    Measuring tree height is a time-consuming process. Often, tree diameter is measured and height is estimated from a published regression model. Trees used to develop these models are clustered into stands, but this structure is ignored and independence is assumed. In this study, hierarchical linear models that account explicitly for the clustered structure of the data...

  3. An hourly PM10 diagnosis model for the Bilbao metropolitan area using a linear regression methodology.

    PubMed

    González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O

    2013-07-01

    There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods.

  4. The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

    ERIC Educational Resources Information Center

    Haberman, Shelby J.; Sinharay, Sandip

    2010-01-01

    Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…

  5. Cooperation without Culture? The Null Effect of Generalized Trust on Intentional Homicide: A Cross-National Panel Analysis, 1995–2009

    PubMed Central

    Robbins, Blaine

    2013-01-01

    Sociologists, political scientists, and economists all suggest that culture plays a pivotal role in the development of large-scale cooperation. In this study, I used generalized trust as a measure of culture to explore if and how culture impacts intentional homicide, my operationalization of cooperation. I compiled multiple cross-national data sets and used pooled time-series linear regression, single-equation instrumental-variables linear regression, and fixed- and random-effects estimation techniques on an unbalanced panel of 118 countries and 232 observations spread over a 15-year time period. Results suggest that culture and large-scale cooperation form a tenuous relationship, while economic factors such as development, inequality, and geopolitics appear to drive large-scale cooperation. PMID:23527211

  6. Forecasting Enrollments with Fuzzy Time Series.

    ERIC Educational Resources Information Center

    Song, Qiang; Chissom, Brad S.

    The concept of fuzzy time series is introduced and used to forecast the enrollment of a university. Fuzzy time series, an aspect of fuzzy set theory, forecasts enrollment using a first-order time-invariant model. To evaluate the model, the conventional linear regression technique is applied and the predicted values obtained are compared to the…

  7. The isoform A of reticulon-4 (Nogo-A) in cerebrospinal fluid of primary brain tumor patients: influencing factors.

    PubMed

    Koper, Olga Martyna; Kamińska, Joanna; Milewska, Anna; Sawicki, Karol; Mariak, Zenon; Kemona, Halina; Matowicka-Karna, Joanna

    2018-05-18

    The influence of isoform A of reticulon-4 (Nogo-A), also known as neurite outgrowth inhibitor, on primary brain tumor development was reported. Therefore the aim was the evaluation of Nogo-A concentrations in cerebrospinal fluid (CSF) and serum of brain tumor patients compared with non-tumoral individuals. All serum results, except for two cases, obtained both in brain tumors and non-tumoral individuals, were below the lower limit of ELISA detection. Cerebrospinal fluid Nogo-A concentrations were significantly lower in primary brain tumor patients compared to non-tumoral individuals. The univariate linear regression analysis found that if white blood cell count increases by 1 × 10 3 /μL, the mean cerebrospinal fluid Nogo-A concentration value decreases 1.12 times. In the model of multiple linear regression analysis predictor variables influencing cerebrospinal fluid Nogo-A concentrations included: diagnosis, sex, and sodium level. The mean cerebrospinal fluid Nogo-A concentration value was 1.9 times higher for women in comparison to men. In the astrocytic brain tumor group higher sodium level occurs with lower cerebrospinal fluid Nogo-A concentrations. We found the opposite situation in non-tumoral individuals. Univariate linear regression analysis revealed, that cerebrospinal fluid Nogo-A concentrations change in relation to white blood cell count. In the created model of multiple linear regression analysis we found, that within predictor variables influencing CSF Nogo-A concentrations were diagnosis, sex, and sodium level. Results may be relevant to the search for cerebrospinal fluid biomarkers and potential therapeutic targets in primary brain tumor patients. Nogo-A concentrations were tested by means of enzyme-linked immunosorbent assay (ELISA).

  8. Self-organising mixture autoregressive model for non-stationary time series modelling.

    PubMed

    Ni, He; Yin, Hujun

    2008-12-01

    Modelling non-stationary time series has been a difficult task for both parametric and nonparametric methods. One promising solution is to combine the flexibility of nonparametric models with the simplicity of parametric models. In this paper, the self-organising mixture autoregressive (SOMAR) network is adopted as a such mixture model. It breaks time series into underlying segments and at the same time fits local linear regressive models to the clusters of segments. In such a way, a global non-stationary time series is represented by a dynamic set of local linear regressive models. Neural gas is used for a more flexible structure of the mixture model. Furthermore, a new similarity measure has been introduced in the self-organising network to better quantify the similarity of time series segments. The network can be used naturally in modelling and forecasting non-stationary time series. Experiments on artificial, benchmark time series (e.g. Mackey-Glass) and real-world data (e.g. numbers of sunspots and Forex rates) are presented and the results show that the proposed SOMAR network is effective and superior to other similar approaches.

  9. Double-time correlation functions of two quantum operations in open systems

    NASA Astrophysics Data System (ADS)

    Ban, Masashi

    2017-10-01

    A double-time correlation function of arbitrary two quantum operations is studied for a nonstationary open quantum system which is in contact with a thermal reservoir. It includes a usual correlation function, a linear response function, and a weak value of an observable. Time evolution of the correlation function can be derived by means of the time-convolution and time-convolutionless projection operator techniques. For this purpose, a quasidensity operator accompanied by a fictitious field is introduced, which makes it possible to derive explicit formulas for calculating a double-time correlation function in the second-order approximation with respect to a system-reservoir interaction. The derived formula explicitly shows that the quantum regression theorem for calculating the double-time correlation function cannot be used if a thermal reservoir has a finite correlation time. Furthermore, the formula is applied for a pure dephasing process and a linear dissipative process. The quantum regression theorem and the the Leggett-Garg inequality are investigated for an open two-level system. The results are compared with those obtained by exact calculation to examine whether the formula is a good approximation.

  10. Cardiac surgery productivity and throughput improvements.

    PubMed

    Lehtonen, Juha-Matti; Kujala, Jaakko; Kouri, Juhani; Hippeläinen, Mikko

    2007-01-01

    The high variability in cardiac surgery length--is one of the main challenges for staff managing productivity. This study aims to evaluate the impact of six interventions on open-heart surgery operating theatre productivity. A discrete operating theatre event simulation model with empirical operation time input data from 2603 patients is used to evaluate the effect that these process interventions have on the surgery output and overtime work. A linear regression model was used to get operation time forecasts for surgery scheduling while it also could be used to explain operation time. A forecasting model based on the linear regression of variables available before the surgery explains 46 per cent operating time variance. The main factors influencing operation length were type of operation, redoing the operation and the head surgeon. Reduction of changeover time between surgeries by inducing anaesthesia outside an operating theatre and by reducing slack time at the end of day after a second surgery have the strongest effects on surgery output and productivity. A more accurate operation time forecast did not have any effect on output, although improved operation time forecast did decrease overtime work. A reduction in the operation time itself is not studied in this article. However, the forecasting model can also be applied to discover which factors are most significant in explaining variation in the length of open-heart surgery. The challenge in scheduling two open-heart surgeries in one day can be partly resolved by increasing the length of the day, decreasing the time between two surgeries or by improving patient scheduling procedures so that two short surgeries can be paired. A linear regression model is created in the paper to increase the accuracy of operation time forecasting and to identify factors that have the most influence on operation time. A simulation model is used to analyse the impact of improved surgical length forecasting and five selected process interventions on productivity in cardiac surgery.

  11. Bayesian quantile regression-based partially linear mixed-effects joint models for longitudinal data with multiple features.

    PubMed

    Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara

    2017-01-01

    In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.

  12. An algebraic method for constructing stable and consistent autoregressive filters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harlim, John, E-mail: jharlim@psu.edu; Department of Meteorology, the Pennsylvania State University, University Park, PA 16802; Hong, Hoon, E-mail: hong@ncsu.edu

    2015-02-15

    In this paper, we introduce an algebraic method to construct stable and consistent univariate autoregressive (AR) models of low order for filtering and predicting nonlinear turbulent signals with memory depth. By stable, we refer to the classical stability condition for the AR model. By consistent, we refer to the classical consistency constraints of Adams–Bashforth methods of order-two. One attractive feature of this algebraic method is that the model parameters can be obtained without directly knowing any training data set as opposed to many standard, regression-based parameterization methods. It takes only long-time average statistics as inputs. The proposed method provides amore » discretization time step interval which guarantees the existence of stable and consistent AR model and simultaneously produces the parameters for the AR models. In our numerical examples with two chaotic time series with different characteristics of decaying time scales, we find that the proposed AR models produce significantly more accurate short-term predictive skill and comparable filtering skill relative to the linear regression-based AR models. These encouraging results are robust across wide ranges of discretization times, observation times, and observation noise variances. Finally, we also find that the proposed model produces an improved short-time prediction relative to the linear regression-based AR-models in forecasting a data set that characterizes the variability of the Madden–Julian Oscillation, a dominant tropical atmospheric wave pattern.« less

  13. Least median of squares and iteratively re-weighted least squares as robust linear regression methods for fluorimetric determination of α-lipoic acid in capsules in ideal and non-ideal cases of linearity.

    PubMed

    Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F

    2018-06-01

    This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.

  14. Developing a predictive tropospheric ozone model for Tabriz

    NASA Astrophysics Data System (ADS)

    Khatibi, Rahman; Naghipour, Leila; Ghorbani, Mohammad A.; Smith, Michael S.; Karimi, Vahid; Farhoudi, Reza; Delafrouz, Hadi; Arvanaghi, Hadi

    2013-04-01

    Predictive ozone models are becoming indispensable tools by providing a capability for pollution alerts to serve people who are vulnerable to the risks. We have developed a tropospheric ozone prediction capability for Tabriz, Iran, by using the following five modeling strategies: three regression-type methods: Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs), and Gene Expression Programming (GEP); and two auto-regression-type models: Nonlinear Local Prediction (NLP) to implement chaos theory and Auto-Regressive Integrated Moving Average (ARIMA) models. The regression-type modeling strategies explain the data in terms of: temperature, solar radiation, dew point temperature, and wind speed, by regressing present ozone values to their past values. The ozone time series are available at various time intervals, including hourly intervals, from August 2010 to March 2011. The results for MLR, ANN and GEP models are not overly good but those produced by NLP and ARIMA are promising for the establishing a forecasting capability.

  15. Digital Image Restoration Under a Regression Model - The Unconstrained, Linear Equality and Inequality Constrained Approaches

    DTIC Science & Technology

    1974-01-01

    REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans

  16. Element enrichment factor calculation using grain-size distribution and functional data regression.

    PubMed

    Sierra, C; Ordóñez, C; Saavedra, A; Gallego, J R

    2015-01-01

    In environmental geochemistry studies it is common practice to normalize element concentrations in order to remove the effect of grain size. Linear regression with respect to a particular grain size or conservative element is a widely used method of normalization. In this paper, the utility of functional linear regression, in which the grain-size curve is the independent variable and the concentration of pollutant the dependent variable, is analyzed and applied to detrital sediment. After implementing functional linear regression and classical linear regression models to normalize and calculate enrichment factors, we concluded that the former regression technique has some advantages over the latter. First, functional linear regression directly considers the grain-size distribution of the samples as the explanatory variable. Second, as the regression coefficients are not constant values but functions depending on the grain size, it is easier to comprehend the relationship between grain size and pollutant concentration. Third, regularization can be introduced into the model in order to establish equilibrium between reliability of the data and smoothness of the solutions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Marginal regression analysis of recurrent events with coarsened censoring times.

    PubMed

    Hu, X Joan; Rosychuk, Rhonda J

    2016-12-01

    Motivated by an ongoing pediatric mental health care (PMHC) study, this article presents weakly structured methods for analyzing doubly censored recurrent event data where only coarsened information on censoring is available. The study extracted administrative records of emergency department visits from provincial health administrative databases. The available information of each individual subject is limited to a subject-specific time window determined up to concealed data. To evaluate time-dependent effect of exposures, we adapt the local linear estimation with right censored survival times under the Cox regression model with time-varying coefficients (cf. Cai and Sun, Scandinavian Journal of Statistics 2003, 30, 93-111). We establish the pointwise consistency and asymptotic normality of the regression parameter estimator, and examine its performance by simulation. The PMHC study illustrates the proposed approach throughout the article. © 2016, The International Biometric Society.

  18. Who Will Win?: Predicting the Presidential Election Using Linear Regression

    ERIC Educational Resources Information Center

    Lamb, John H.

    2007-01-01

    This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…

  19. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    NASA Astrophysics Data System (ADS)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  20. The Use of Shrinkage Techniques in the Estimation of Attrition Rates for Large Scale Manpower Models

    DTIC Science & Technology

    1988-07-27

    auto regressive model combined with a linear program that solves for the coefficients using MAD. But this success has diminished with time (Rowe...8217Harrison-Stevens Forcasting and the Multiprocess Dy- namic Linear Model ", The American Statistician, v.40, pp. 12 9 - 1 3 5 . 1986. 8. Box, G. E. P. and...1950. 40. McCullagh, P. and Nelder, J., Generalized Linear Models , Chapman and Hall. 1983. 41. McKenzie, E. General Exponential Smoothing and the

  1. Reliability Analysis of the Gradual Degradation of Semiconductor Devices.

    DTIC Science & Technology

    1983-07-20

    under the heading of linear models or linear statistical models . 3 ,4 We have not used this material in this report. Assuming catastrophic failure when...assuming a catastrophic model . In this treatment we first modify our system loss formula and then proceed to the actual analysis. II. ANALYSIS OF...Failure Time 1 Ti Ti 2 T2 T2 n Tn n and are easily analyzed by simple linear regression. Since we have assumed a log normal/Arrhenius activation

  2. The microcomputer scientific software series 2: general linear model--regression.

    Treesearch

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  3. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    PubMed

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  4. A new linear least squares method for T1 estimation from SPGR signals with multiple TRs

    NASA Astrophysics Data System (ADS)

    Chang, Lin-Ching; Koay, Cheng Guan; Basser, Peter J.; Pierpaoli, Carlo

    2009-02-01

    The longitudinal relaxation time, T1, can be estimated from two or more spoiled gradient recalled echo x (SPGR) images with two or more flip angles and one or more repetition times (TRs). The function relating signal intensity and the parameters are nonlinear; T1 maps can be computed from SPGR signals using nonlinear least squares regression. A widely-used linear method transforms the nonlinear model by assuming a fixed TR in SPGR images. This constraint is not desirable since multiple TRs are a clinically practical way to reduce the total acquisition time, to satisfy the required resolution, and/or to combine SPGR data acquired at different times. A new linear least squares method is proposed using the first order Taylor expansion. Monte Carlo simulations of SPGR experiments are used to evaluate the accuracy and precision of the estimated T1 from the proposed linear and the nonlinear methods. We show that the new linear least squares method provides T1 estimates comparable in both precision and accuracy to those from the nonlinear method, allowing multiple TRs and reducing computation time significantly.

  5. A model of the human in a cognitive prediction task.

    NASA Technical Reports Server (NTRS)

    Rouse, W. B.

    1973-01-01

    The human decision maker's behavior when predicting future states of discrete linear dynamic systems driven by zero-mean Gaussian processes is modeled. The task is on a slow enough time scale that physiological constraints are insignificant compared with cognitive limitations. The model is basically a linear regression system identifier with a limited memory and noisy observations. Experimental data are presented and compared to the model.

  6. [Ultrasonic measurements of fetal thalamus, caudate nucleus and lenticular nucleus in prenatal diagnosis].

    PubMed

    Yang, Ruiqi; Wang, Fei; Zhang, Jialing; Zhu, Chonglei; Fan, Limei

    2015-05-19

    To establish the reference values of thalamus, caudate nucleus and lenticular nucleus diameters through fetal thalamic transverse section. A total of 265 fetuses at our hospital were randomly selected from November 2012 to August 2014. And the transverse and length diameters of thalamus, caudate nucleus and lenticular nucleus were measured. SPSS 19.0 statistical software was used to calculate the regression curve of fetal diameter changes and gestational weeks of pregnancy. P < 0.05 was considered as having statistical significance. The linear regression equation of fetal thalamic length diameter and gestational week was: Y = 0.051X+0.201, R = 0.876, linear regression equation of thalamic transverse diameter and fetal gestational week was: Y = 0.031X+0.229, R = 0.817, linear regression equation of fetal head of caudate nucleus length diameter and gestational age was: Y = 0.033X+0.101, R = 0.722, linear regression equation of fetal head of caudate nucleus transverse diameter and gestational week was: R = 0.025 - 0.046, R = 0.711, linear regression equation of fetal lentiform nucleus length diameter and gestational week was: Y = 0.046+0.229, R = 0.765, linear regression equation of fetal lentiform nucleus diameter and gestational week was: Y = 0.025 - 0.05, R = 0.772. Ultrasonic measurement of diameter of fetal thalamus caudate nucleus, and lenticular nucleus through thalamic transverse section is simple and convenient. And measurements increase with fetal gestational weeks and there is linear regression relationship between them.

  7. Orthogonal Regression: A Teaching Perspective

    ERIC Educational Resources Information Center

    Carr, James R.

    2012-01-01

    A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…

  8. The use of experimental design in the development of an HPLC-ECD method for the analysis of captopril.

    PubMed

    Khamanga, Sandile M; Walker, Roderick B

    2011-01-15

    An accurate, sensitive and specific high performance liquid chromatography-electrochemical detection (HPLC-ECD) method that was developed and validated for captopril (CPT) is presented. Separation was achieved using a Phenomenex(®) Luna 5 μm (C(18)) column and a mobile phase comprised of phosphate buffer (adjusted to pH 3.0): acetonitrile in a ratio of 70:30 (v/v). Detection was accomplished using a full scan multi channel ESA Coulometric detector in the "oxidative-screen" mode with the upstream electrode (E(1)) set at +600 mV and the downstream (analytical) electrode (E(2)) set at +950 mV, while the potential of the guard cell was maintained at +1050 mV. The detector gain was set at 300. Experimental design using central composite design (CCD) was used to facilitate method development. Mobile phase pH, molarity and concentration of acetonitrile (ACN) were considered the critical factors to be studied to establish the retention time of CPT and cyclizine (CYC) that was used as the internal standard. Twenty experiments including centre points were undertaken and a quadratic model was derived for the retention time for CPT using the experimental data. The method was validated for linearity, accuracy, precision, limits of quantitation and detection, as per the ICH guidelines. The system was found to produce sharp and well-resolved peaks for CPT and CYC with retention times of 3.08 and 7.56 min, respectively. Linear regression analysis for the calibration curve showed a good linear relationship with a regression coefficient of 0.978 in the concentration range of 2-70 μg/mL. The linear regression equation was y=0.0131x+0.0275. The limits of detection (LOQ) and quantitation (LOD) were found to be 2.27 and 0.6 μg/mL, respectively. The method was used to analyze CPT in tablets. The wide range for linearity, accuracy, sensitivity, short retention time and composition of the mobile phase indicated that this method is better for the quantification of CPT than the pharmacopoeial methods. Copyright © 2010 Elsevier B.V. All rights reserved.

  9. Practical Session: Simple Linear Regression

    NASA Astrophysics Data System (ADS)

    Clausel, M.; Grégoire, G.

    2014-12-01

    Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).

  10. Introduction to the use of regression models in epidemiology.

    PubMed

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  11. Morse Code, Scrabble, and the Alphabet

    ERIC Educational Resources Information Center

    Richardson, Mary; Gabrosek, John; Reischman, Diann; Curtiss, Phyliss

    2004-01-01

    In this paper we describe an interactive activity that illustrates simple linear regression. Students collect data and analyze it using simple linear regression techniques taught in an introductory applied statistics course. The activity is extended to illustrate checks for regression assumptions and regression diagnostics taught in an…

  12. Analyzing Seasonal Variations in Suicide With Fourier Poisson Time-Series Regression: A Registry-Based Study From Norway, 1969-2007.

    PubMed

    Bramness, Jørgen G; Walby, Fredrik A; Morken, Gunnar; Røislien, Jo

    2015-08-01

    Seasonal variation in the number of suicides has long been acknowledged. It has been suggested that this seasonality has declined in recent years, but studies have generally used statistical methods incapable of confirming this. We examined all suicides occurring in Norway during 1969-2007 (more than 20,000 suicides in total) to establish whether seasonality decreased over time. Fitting of additive Fourier Poisson time-series regression models allowed for formal testing of a possible linear decrease in seasonality, or a reduction at a specific point in time, while adjusting for a possible smooth nonlinear long-term change without having to categorize time into discrete yearly units. The models were compared using Akaike's Information Criterion and analysis of variance. A model with a seasonal pattern was significantly superior to a model without one. There was a reduction in seasonality during the period. Both the model assuming a linear decrease in seasonality and the model assuming a change at a specific point in time were both superior to a model assuming constant seasonality, thus confirming by formal statistical testing that the magnitude of the seasonality in suicides has diminished. The additive Fourier Poisson time-series regression model would also be useful for studying other temporal phenomena with seasonal components. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  13. Advanced statistics: linear regression, part II: multiple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  14. Reversed inverse regression for the univariate linear calibration and its statistical properties derived using a new methodology

    NASA Astrophysics Data System (ADS)

    Kang, Pilsang; Koo, Changhoi; Roh, Hokyu

    2017-11-01

    Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.

  15. A novel multi-target regression framework for time-series prediction of drug efficacy.

    PubMed

    Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin

    2017-01-18

    Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task.

  16. A novel multi-target regression framework for time-series prediction of drug efficacy

    PubMed Central

    Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin

    2017-01-01

    Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task. PMID:28098186

  17. Representational change and strategy use in children's number line estimation during the first years of primary school.

    PubMed

    White, Sonia L J; Szűcs, Dénes

    2012-01-04

    The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice.

  18. Representational change and strategy use in children's number line estimation during the first years of primary school

    PubMed Central

    2012-01-01

    Background The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Methods Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Results Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. Conclusion In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice. PMID:22217191

  19. A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.

    PubMed

    Ferrari, Alberto; Comelli, Mario

    2016-12-01

    In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. A Simple and Specific Stability- Indicating RP-HPLC Method for Routine Assay of Adefovir Dipivoxil in Bulk and Tablet Dosage Form.

    PubMed

    Darsazan, Bahar; Shafaati, Alireza; Mortazavi, Seyed Alireza; Zarghi, Afshin

    2017-01-01

    A simple and reliable stability-indicating RP-HPLC method was developed and validated for analysis of adefovir dipivoxil (ADV).The chromatographic separation was performed on a C 18 column using a mixture of acetonitrile-citrate buffer (10 mM at pH 5.2) 36:64 (%v/v) as mobile phase, at a flow rate of 1.5 mL/min. Detection was carried out at 260 nm and a sharp peak was obtained for ADV at a retention time of 5.8 ± 0.01 min. No interferences were observed from its stress degradation products. The method was validated according to the international guidelines. Linear regression analysis of data for the calibration plot showed a linear relationship between peak area and concentration over the range of 0.5-16 μg/mL; the regression coefficient was 0.9999and the linear regression equation was y = 24844x-2941.3. The detection (LOD) and quantification (LOQ) limits were 0.12 and 0.35 μg/mL, respectively. The results proved the method was fast (analysis time less than 7 min), precise, reproducible, and accurate for analysis of ADV over a wide range of concentration. The proposed specific method was used for routine quantification of ADV in pharmaceutical bulk and a tablet dosage form.

  1. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    PubMed

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  2. Year-round measurements of CH4 exchange in a forested drained peatland using automated chambers

    NASA Astrophysics Data System (ADS)

    Korkiakoski, Mika; Koskinen, Markku; Penttilä, Timo; Arffman, Pentti; Ojanen, Paavo; Minkkinen, Kari; Laurila, Tuomas; Lohila, Annalea

    2016-04-01

    Pristine peatlands are usually carbon accumulating ecosystems and sources of methane (CH4). Draining peatlands for forestry increases the thickness of the oxic layer, thus enhancing CH4 oxidation which leads to decreased CH4 emissions. Closed chambers are commonly used in estimating the greenhouse gas exchange between the soil and the atmosphere. However, the closed chamber technique alters the gas concentration gradient making the concentration development against time non-linear. Selecting the correct fitting method is important as it can be the largest source of uncertainty in flux calculation. We measured CH4 exchange rates and their diurnal and seasonal variations in a nutrient-rich drained peatland located in southern Finland. The original fen was drained for forestry in 1970s and now the tree stand is a mixture of Scots pine, Norway spruce and Downy birch. Our system consisted of six transparent polycarbonate chambers and stainless steel frames, positioned on different types of field and moss layer. During winter, the frame was raised above the snowpack with extension collars and the height of the snowpack inside the chamber was measured regularly. The chambers were closed hourly and the sample gas was sucked into a cavity ring-down spectrometer and analysed for CH4, CO2 and H2O concentration with 5 second time resolution. The concentration change in time in the beginning of a closure was determined with linear and exponential fits. The results show that linear regression systematically underestimated the CH4 flux when compared to exponential regression by 20-50 %. On the other hand, the exponential regression seemed not to work reliably with small fluxes (< 3.5 μg CH4 m-2 h-1): using exponential regression in such cases typically resulted in anomalously large fluxes and high deviation. Due to these facts, we recommend first calculating the flux with the linear regression and, if the flux is high enough, calculate the flux again using the exponential regression and use this value in later analysis. The forest floor at the site (including the ground vegetation) acted as a CH4 sink most of the time. CH4 emission peaks were occasionally observed, particularly in spring during the snow melt, and during rainfall events in summer. Diurnal variation was observed mainly in summer. The net CH4 exchange for the two year measurement period in the six chambers varied from -31 to -155 mg CH4 m-2 yr-1, the average being -67 mg CH4 m-2 yr-1. However, this does not include the ditches which typically act as a significant source for CH4.

  3. Quality of life in breast cancer patients--a quantile regression analysis.

    PubMed

    Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma

    2008-01-01

    Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.

  4. Interpretation of commonly used statistical regression models.

    PubMed

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  5. PMICALC: an R code-based software for estimating post-mortem interval (PMI) compatible with Windows, Mac and Linux operating systems.

    PubMed

    Muñoz-Barús, José I; Rodríguez-Calvo, María Sol; Suárez-Peñaranda, José M; Vieira, Duarte N; Cadarso-Suárez, Carmen; Febrero-Bande, Manuel

    2010-01-30

    In legal medicine the correct determination of the time of death is of utmost importance. Recent advances in estimating post-mortem interval (PMI) have made use of vitreous humour chemistry in conjunction with Linear Regression, but the results are questionable. In this paper we present PMICALC, an R code-based freeware package which estimates PMI in cadavers of recent death by measuring the concentrations of potassium ([K+]), hypoxanthine ([Hx]) and urea ([U]) in the vitreous humor using two different regression models: Additive Models (AM) and Support Vector Machine (SVM), which offer more flexibility than the previously used Linear Regression. The results from both models are better than those published to date and can give numerical expression of PMI with confidence intervals and graphic support within 20 min. The program also takes into account the cause of death. 2009 Elsevier Ireland Ltd. All rights reserved.

  6. Serum Iron Level Is Associated with Time to Antibiotics in Cystic Fibrosis.

    PubMed

    Gifford, Alex H; Dorman, Dana B; Moulton, Lisa A; Helm, Jennifer E; Griffin, Mary M; MacKenzie, Todd A

    2015-12-01

    Serum levels of hepcidin-25, a peptide hormone that reduces blood iron content, are elevated when patients with cystic fibrosis (CF) develop pulmonary exacerbation (PEx). Because hepcidin-25 is unavailable as a clinical laboratory test, we questioned whether a one-time serum iron level was associated with the subsequent number of days until PEx, as defined by the need to receive systemic antibiotics (ABX) for health deterioration. Clinical, biochemical, and microbiological parameters were simultaneously checked in 54 adults with CF. Charts were reviewed to determine when they first experienced a PEx after these parameters were assessed. Time to ABX was compared in subgroups with and without specific attributes. Multivariate linear regression was used to identify parameters that significantly explained variation in time to ABX. In univariate analyses, time to ABX was significantly shorter in subjects with Aspergillus-positive sputum cultures and CF-related diabetes. Multivariate linear regression models demonstrated that shorter time to ABX was associated with younger age, lower serum iron level, and Aspergillus sputum culture positivity. Serum iron, age, and Aspergillus sputum culture positivity are factors associated with shorter time to subsequent PEx in CF adults. © 2015 Wiley Periodicals, Inc.

  7. Changes in aerobic power of men, ages 25-70 yr

    NASA Technical Reports Server (NTRS)

    Jackson, A. S.; Beard, E. F.; Wier, L. T.; Ross, R. M.; Stuteville, J. E.; Blair, S. N.

    1995-01-01

    This study quantified and compared the cross-sectional and longitudinal influence of age, self-report physical activity (SR-PA), and body composition (%fat) on the decline of maximal aerobic power (VO2peak). The cross-sectional sample consisted of 1,499 healthy men ages 25-70 yr. The 156 men of the longitudinal sample were from the same population and examined twice, the mean time between tests was 4.1 (+/- 1.2) yr. Peak oxygen uptake was determined by indirect calorimetry during a maximal treadmill exercise test. The zero-order correlations between VO2peak and %fat (r = -0.62) and SR-PA (r = 0.58) were significantly (P < 0.05) higher that the age correlation (r = -0.45). Linear regression defined the cross-sectional age-related decline in VO2peak at 0.46 ml.kg-1.min-1.yr-1. Multiple regression analysis (R = 0.79) showed that nearly 50% of this cross-sectional decline was due to %fat and SR-PA, adding these lifestyle variables to the multiple regression model reduced the age regression weight to -0.26 ml.kg-1.min-1.yr-1. Statistically controlling for time differences between tests, general linear models analysis showed that longitudinal changes in aerobic power were due to independent changes in %fat and SR-PA, confirming the cross-sectional results.

  8. Postmolar gestational trophoblastic neoplasia: beyond the traditional risk factors.

    PubMed

    Bakhtiyari, Mahmood; Mirzamoradi, Masoumeh; Kimyaiee, Parichehr; Aghaie, Abbas; Mansournia, Mohammd Ali; Ashrafi-Vand, Sepideh; Sarfjoo, Fatemeh Sadat

    2015-09-01

    To investigate the slope of linear regression of postevacuation serum hCG as an independent risk factor for postmolar gestational trophoblastic neoplasia (GTN). Multicenter retrospective cohort study. Academic referral health care centers. All subjects with confirmed hydatidiform mole and at least four measurements of β-hCG titer. None. Type and magnitude of the relationship between the slope of linear regression of β-hCG as a new risk factor and GTN using Bayesian logistic regression with penalized log-likelihood estimation. Among the high-risk and low-risk molar pregnancy cases, 11 (18.6%) and 19 cases (13.3%) had GTN, respectively. No significant relationship was found between the components of a high-risk pregnancy and GTN. The β-hCG return slope was higher in the spontaneous cure group. However, the initial level of this hormone in the first measurement was higher in the GTN group compared with in the spontaneous recovery group. The average time for diagnosing GTN in the high-risk molar pregnancy group was 2 weeks less than that of the low-risk molar pregnancy group. In addition to slope of linear regression of β-hCG (odds ratio [OR], 12.74, confidence interval [CI], 5.42-29.2), abortion history (OR, 2.53; 95% CI, 1.27-5.04) and large uterine height for gestational age (OR, 1.26; CI, 1.04-1.54) had the maximum effects on GTN outcome, respectively. The slope of linear regression of β-hCG was introduced as an independent risk factor, which could be used for clinical decision making based on records of β-hCG titer and subsequent prevention program. Copyright © 2015 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.

  9. Application of empirical mode decomposition with local linear quantile regression in financial time series forecasting.

    PubMed

    Jaber, Abobaker M; Ismail, Mohd Tahir; Altaher, Alsaidi M

    2014-01-01

    This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD) with nonparametric methods of local linear quantile (LLQ). We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.

  10. Linear-regression convolutional neural network for fully automated coronary lumen segmentation in intravascular optical coherence tomography

    NASA Astrophysics Data System (ADS)

    Yong, Yan Ling; Tan, Li Kuo; McLaughlin, Robert A.; Chee, Kok Han; Liew, Yih Miin

    2017-12-01

    Intravascular optical coherence tomography (OCT) is an optical imaging modality commonly used in the assessment of coronary artery diseases during percutaneous coronary intervention. Manual segmentation to assess luminal stenosis from OCT pullback scans is challenging and time consuming. We propose a linear-regression convolutional neural network to automatically perform vessel lumen segmentation, parameterized in terms of radial distances from the catheter centroid in polar space. Benchmarked against gold-standard manual segmentation, our proposed algorithm achieves average locational accuracy of the vessel wall of 22 microns, and 0.985 and 0.970 in Dice coefficient and Jaccard similarity index, respectively. The average absolute error of luminal area estimation is 1.38%. The processing rate is 40.6 ms per image, suggesting the potential to be incorporated into a clinical workflow and to provide quantitative assessment of vessel lumen in an intraoperative time frame.

  11. Learning curve of single port laparoscopic cholecystectomy determined using the non-linear ordinary least squares method based on a non-linear regression model: An analysis of 150 consecutive patients.

    PubMed

    Han, Hyung Joon; Choi, Sae Byeol; Park, Man Sik; Lee, Jin Suk; Kim, Wan Bae; Song, Tae Jin; Choi, Sang Yong

    2011-07-01

    Single port laparoscopic surgery has come to the forefront of minimally invasive surgery. For those familiar with conventional techniques, however, this type of operation demands a different type of eye/hand coordination and involves unfamiliar working instruments. Herein, the authors describe the learning curve and the clinical outcomes of single port laparoscopic cholecystectomy for 150 consecutive patients with benign gallbladder disease. All patients underwent single port laparoscopic cholecystectomy using a homemade glove port by one of five operators with different levels of experiences of laparoscopic surgery. The learning curve for each operator was fitted using the non-linear ordinary least squares method based on a non-linear regression model. Mean operating time was 77.6 ± 28.5 min. Fourteen patients (6.0%) were converted to conventional laparoscopic cholecystectomy. Complications occurred in 15 patients (10.0%), as follows: bile duct injury (n = 2), surgical site infection (n = 8), seroma (n = 2), and wound pain (n = 3). One operator achieved a learning curve plateau at 61.4 min per procedure after 8.5 cases and his time improved by 95.3 min as compared with initial operation time. Younger surgeons showed significant decreases in mean operation time and achieved stable mean operation times. In particular, younger surgeons showed significant decreases in operation times after 20 cases. Experienced laparoscopic surgeons can safely perform single port laparoscopic cholecystectomy using conventional or angled laparoscopic instruments. The present study shows that an operator can overcome the single port laparoscopic cholecystectomy learning curve in about eight cases.

  12. Simplified large African carnivore density estimators from track indices.

    PubMed

    Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J

    2016-01-01

    The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y  =  αx  + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P  > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P  < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.

  13. [From clinical judgment to linear regression model.

    PubMed

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.

  14. Statistical approach to the analysis of olive long-term pollen season trends in southern Spain.

    PubMed

    García-Mozo, H; Yaezel, L; Oteros, J; Galán, C

    2014-03-01

    Analysis of long-term airborne pollen counts makes it possible not only to chart pollen-season trends but also to track changing patterns in flowering phenology. Changes in higher plant response over a long interval are considered among the most valuable bioindicators of climate change impact. Phenological-trend models can also provide information regarding crop production and pollen-allergen emission. The interest of this information makes essential the election of the statistical analysis for time series study. We analysed trends and variations in the olive flowering season over a 30-year period (1982-2011) in southern Europe (Córdoba, Spain), focussing on: annual Pollen Index (PI); Pollen Season Start (PSS), Peak Date (PD), Pollen Season End (PSE) and Pollen Season Duration (PSD). Apart from the traditional Linear Regression analysis, a Seasonal-Trend Decomposition procedure based on Loess (STL) and an ARIMA model were performed. Linear regression results indicated a trend toward delayed PSE and earlier PSS and PD, probably influenced by the rise in temperature. These changes are provoking longer flowering periods in the study area. The use of the STL technique provided a clearer picture of phenological behaviour. Data decomposition on pollination dynamics enabled the trend toward an alternate bearing cycle to be distinguished from the influence of other stochastic fluctuations. Results pointed to show a rising trend in pollen production. With a view toward forecasting future phenological trends, ARIMA models were constructed to predict PSD, PSS and PI until 2016. Projections displayed a better goodness of fit than those derived from linear regression. Findings suggest that olive reproductive cycle is changing considerably over the last 30years due to climate change. Further conclusions are that STL improves the effectiveness of traditional linear regression in trend analysis, and ARIMA models can provide reliable trend projections for future years taking into account the internal fluctuations in time series. Copyright © 2013 Elsevier B.V. All rights reserved.

  15. Method and Excel VBA Algorithm for Modeling Master Recession Curve Using Trigonometry Approach.

    PubMed

    Posavec, Kristijan; Giacopetti, Marco; Materazzi, Marco; Birk, Steffen

    2017-11-01

    A new method was developed and implemented into an Excel Visual Basic for Applications (VBAs) algorithm utilizing trigonometry laws in an innovative way to overlap recession segments of time series and create master recession curves (MRCs). Based on a trigonometry approach, the algorithm horizontally translates succeeding recession segments of time series, placing their vertex, that is, the highest recorded value of each recession segment, directly onto the appropriate connection line defined by measurement points of a preceding recession segment. The new method and algorithm continues the development of methods and algorithms for the generation of MRC, where the first published method was based on a multiple linear/nonlinear regression model approach (Posavec et al. 2006). The newly developed trigonometry-based method was tested on real case study examples and compared with the previously published multiple linear/nonlinear regression model-based method. The results show that in some cases, that is, for some time series, the trigonometry-based method creates narrower overlaps of the recession segments, resulting in higher coefficients of determination R 2 , while in other cases the multiple linear/nonlinear regression model-based method remains superior. The Excel VBA algorithm for modeling MRC using the trigonometry approach is implemented into a spreadsheet tool (MRCTools v3.0 written by and available from Kristijan Posavec, Zagreb, Croatia) containing the previously published VBA algorithms for MRC generation and separation. All algorithms within the MRCTools v3.0 are open access and available free of charge, supporting the idea of running science on available, open, and free of charge software. © 2017, National Ground Water Association.

  16. Time Advice and Learning Questions in Computer Simulations

    ERIC Educational Resources Information Center

    Rey, Gunter Daniel

    2011-01-01

    Students (N = 101) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without time advice) x 3 (with learning questions and corrective feedback, with…

  17. The Prediction of Achievement and Time Spent in Instruction in a Self-Paced Individualized Course.

    ERIC Educational Resources Information Center

    Franklin, Thomas E.

    Multiple linear regressions were employed to determine the relative contributions of cognitive and affective variables accounting for variance in college students' achievement and amount of time taken to complete a self-paced, individualized course. Study habits and attitudes (SSHA) made greater relative contributions to explaining total course…

  18. Fourier transform infrared reflectance spectra of latent fingerprints: a biometric gauge for the age of an individual.

    PubMed

    Hemmila, April; McGill, Jim; Ritter, David

    2008-03-01

    To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.

  19. Linearity versus Nonlinearity of Offspring-Parent Regression: An Experimental Study of Drosophila Melanogaster

    PubMed Central

    Gimelfarb, A.; Willis, J. H.

    1994-01-01

    An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818

  20. Personal Best Time, Percent Body Fat, and Training Are Differently Associated with Race Time for Male and Female Ironman Triathletes

    ERIC Educational Resources Information Center

    Knechtle, Beat; Wirth, Andrea; Baumann, Barbara; Knechtle, Patrizia; Rosemann, Thomas

    2010-01-01

    We studied male and female nonprofessional Ironman triathletes to determine whether percent body fat, training, and/or previous race experience were associated with race performance. We used simple linear regression analysis, with total race time as the dependent variable, to investigate the relationship among athletes' percent body fat, average…

  1. Quantile Regression Models for Current Status Data

    PubMed Central

    Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen

    2016-01-01

    Current status data arise frequently in demography, epidemiology, and econometrics where the exact failure time cannot be determined but is only known to have occurred before or after a known observation time. We propose a quantile regression model to analyze current status data, because it does not require distributional assumptions and the coefficients can be interpreted as direct regression effects on the distribution of failure time in the original time scale. Our model assumes that the conditional quantile of failure time is a linear function of covariates. We assume conditional independence between the failure time and observation time. An M-estimator is developed for parameter estimation which is computed using the concave-convex procedure and its confidence intervals are constructed using a subsampling method. Asymptotic properties for the estimator are derived and proven using modern empirical process theory. The small sample performance of the proposed method is demonstrated via simulation studies. Finally, we apply the proposed method to analyze data from the Mayo Clinic Study of Aging. PMID:27994307

  2. An approach to checking case-crossover analyses based on equivalence with time-series methods.

    PubMed

    Lu, Yun; Symons, James Morel; Geyh, Alison S; Zeger, Scott L

    2008-03-01

    The case-crossover design has been increasingly applied to epidemiologic investigations of acute adverse health effects associated with ambient air pollution. The correspondence of the design to that of matched case-control studies makes it inferentially appealing for epidemiologic studies. Case-crossover analyses generally use conditional logistic regression modeling. This technique is equivalent to time-series log-linear regression models when there is a common exposure across individuals, as in air pollution studies. Previous methods for obtaining unbiased estimates for case-crossover analyses have assumed that time-varying risk factors are constant within reference windows. In this paper, we rely on the connection between case-crossover and time-series methods to illustrate model-checking procedures from log-linear model diagnostics for time-stratified case-crossover analyses. Additionally, we compare the relative performance of the time-stratified case-crossover approach to time-series methods under 3 simulated scenarios representing different temporal patterns of daily mortality associated with air pollution in Chicago, Illinois, during 1995 and 1996. Whenever a model-be it time-series or case-crossover-fails to account appropriately for fluctuations in time that confound the exposure, the effect estimate will be biased. It is therefore important to perform model-checking in time-stratified case-crossover analyses rather than assume the estimator is unbiased.

  3. Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.

    PubMed

    Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C

    2014-03-01

    In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.

  4. Environmental factors and flow paths related to Escherichia coli concentrations at two beaches on Lake St. Clair, Michigan, 2002–2005

    USGS Publications Warehouse

    Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.

    2008-01-01

    Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.

  5. Unitary Response Regression Models

    ERIC Educational Resources Information Center

    Lipovetsky, S.

    2007-01-01

    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  6. Effects of integration time on in-water radiometric profiles.

    PubMed

    D'Alimonte, Davide; Zibordi, Giuseppe; Kajiyama, Tamito

    2018-03-05

    This work investigates the effects of integration time on in-water downward irradiance E d , upward irradiance E u and upwelling radiance L u profile data acquired with free-fall hyperspectral systems. Analyzed quantities are the subsurface value and the diffuse attenuation coefficient derived by applying linear and non-linear regression schemes. Case studies include oligotrophic waters (Case-1), as well as waters dominated by Colored Dissolved Organic Matter (CDOM) and Non-Algal Particles (NAP). Assuming a 24-bit digitization, measurements resulting from the accumulation of photons over integration times varying between 8 and 2048ms are evaluated at depths corresponding to: 1) the beginning of each integration interval (Fst); 2) the end of each integration interval (Lst); 3) the averages of Fst and Lst values (Avg); and finally 4) the values weighted accounting for the diffuse attenuation coefficient of water (Wgt). Statistical figures show that the effects of integration time can bias results well above 5% as a function of the depth definition. Results indicate the validity of the Wgt depth definition and the fair applicability of the Avg one. Instead, both the Fst and Lst depths should not be adopted since they may introduce pronounced biases in E u and L u regression products for highly absorbing waters. Finally, the study reconfirms the relevance of combining multiple radiometric casts into a single profile to increase precision of regression products.

  7. Modeling Longitudinal Data Containing Non-Normal Within Subject Errors

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan; Glenn, Nancy L.

    2013-01-01

    The mission of the National Aeronautics and Space Administration’s (NASA) human research program is to advance safe human spaceflight. This involves conducting experiments, collecting data, and analyzing data. The data are longitudinal and result from a relatively few number of subjects; typically 10 – 20. A longitudinal study refers to an investigation where participant outcomes and possibly treatments are collected at multiple follow-up times. Standard statistical designs such as mean regression with random effects and mixed–effects regression are inadequate for such data because the population is typically not approximately normally distributed. Hence, more advanced data analysis methods are necessary. This research focuses on four such methods for longitudinal data analysis: the recently proposed linear quantile mixed models (lqmm) by Geraci and Bottai (2013), quantile regression, multilevel mixed–effects linear regression, and robust regression. This research also provides computational algorithms for longitudinal data that scientists can directly use for human spaceflight and other longitudinal data applications, then presents statistical evidence that verifies which method is best for specific situations. This advances the study of longitudinal data in a broad range of applications including applications in the sciences, technology, engineering and mathematics fields.

  8. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

    PubMed

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-04-08

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.

  9. TSS concentration in sewers estimated from turbidity measurements by means of linear regression accounting for uncertainties in both variables.

    PubMed

    Bertrand-Krajewski, J L

    2004-01-01

    In order to replace traditional sampling and analysis techniques, turbidimeters can be used to estimate TSS concentration in sewers, by means of sensor and site specific empirical equations established by linear regression of on-site turbidity Tvalues with TSS concentrations C measured in corresponding samples. As the ordinary least-squares method is not able to account for measurement uncertainties in both T and C variables, an appropriate regression method is used to solve this difficulty and to evaluate correctly the uncertainty in TSS concentrations estimated from measured turbidity. The regression method is described, including detailed calculations of variances and covariance in the regression parameters. An example of application is given for a calibrated turbidimeter used in a combined sewer system, with data collected during three dry weather days. In order to show how the established regression could be used, an independent 24 hours long dry weather turbidity data series recorded at 2 min time interval is used, transformed into estimated TSS concentrations, and compared to TSS concentrations measured in samples. The comparison appears as satisfactory and suggests that turbidity measurements could replace traditional samples. Further developments, including wet weather periods and other types of sensors, are suggested.

  10. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression

    PubMed Central

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-01-01

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale. PMID:28397745

  11. Compound Identification Using Penalized Linear Regression on Metabolomics

    PubMed Central

    Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho

    2014-01-01

    Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894

  12. Modeling absolute differences in life expectancy with a censored skew-normal regression approach

    PubMed Central

    Clough-Gorr, Kerri; Zwahlen, Marcel

    2015-01-01

    Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544

  13. Linear regression models and k-means clustering for statistical analysis of fNIRS data.

    PubMed

    Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

    2015-02-01

    We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.

  14. Linear regression models and k-means clustering for statistical analysis of fNIRS data

    PubMed Central

    Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

    2015-01-01

    We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets. PMID:25780751

  15. Control Variate Selection for Multiresponse Simulation.

    DTIC Science & Technology

    1987-05-01

    M. H. Knuter, Applied Linear Regression Mfodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F., Probability, Allyn and Bacon...1982. Neter, J., V. Wasserman, and M. H. Knuter, Applied Linear Regression .fodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F...Aspects of J%,ultivariate Statistical Theory, John Wiley and Sons, New York, New York, 1982. dY Neter, J., W. Wasserman, and M. H. Knuter, Applied Linear Regression Mfodels

  16. An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

    ERIC Educational Resources Information Center

    Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

    2011-01-01

    This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…

  17. High correlations between MRI brain volume measurements based on NeuroQuant® and FreeSurfer.

    PubMed

    Ross, David E; Ochs, Alfred L; Tate, David F; Tokac, Umit; Seabaugh, John; Abildskov, Tracy J; Bigler, Erin D

    2018-05-30

    NeuroQuant ® (NQ) and FreeSurfer (FS) are commonly used computer-automated programs for measuring MRI brain volume. Previously they were reported to have high intermethod reliabilities but often large intermethod effect size differences. We hypothesized that linear transformations could be used to reduce the large effect sizes. This study was an extension of our previously reported study. We performed NQ and FS brain volume measurements on 60 subjects (including normal controls, patients with traumatic brain injury, and patients with Alzheimer's disease). We used two statistical approaches in parallel to develop methods for transforming FS volumes into NQ volumes: traditional linear regression, and Bayesian linear regression. For both methods, we used regression analyses to develop linear transformations of the FS volumes to make them more similar to the NQ volumes. The FS-to-NQ transformations based on traditional linear regression resulted in effect sizes which were small to moderate. The transformations based on Bayesian linear regression resulted in all effect sizes being trivially small. To our knowledge, this is the first report describing a method for transforming FS to NQ data so as to achieve high reliability and low effect size differences. Machine learning methods like Bayesian regression may be more useful than traditional methods. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. Quantile Regression in the Study of Developmental Sciences

    PubMed Central

    Petscher, Yaacov; Logan, Jessica A. R.

    2014-01-01

    Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstrated and contrasted with linear regression when considering models with: (a) one continuous predictor, (b) one dichotomous predictor, (c) a continuous and a dichotomous predictor, and (d) a longitudinal application. Results from each example exhibited the differential inferences which may be drawn using linear or quantile regression. PMID:24329596

  19. Quantum algorithm for linear regression

    NASA Astrophysics Data System (ADS)

    Wang, Guoming

    2017-07-01

    We present a quantum algorithm for fitting a linear regression model to a given data set using the least-squares approach. Differently from previous algorithms which yield a quantum state encoding the optimal parameters, our algorithm outputs these numbers in the classical form. So by running it once, one completely determines the fitted model and then can use it to make predictions on new data at little cost. Moreover, our algorithm works in the standard oracle model, and can handle data sets with nonsparse design matrices. It runs in time poly( log2(N ) ,d ,κ ,1 /ɛ ) , where N is the size of the data set, d is the number of adjustable parameters, κ is the condition number of the design matrix, and ɛ is the desired precision in the output. We also show that the polynomial dependence on d and κ is necessary. Thus, our algorithm cannot be significantly improved. Furthermore, we also give a quantum algorithm that estimates the quality of the least-squares fit (without computing its parameters explicitly). This algorithm runs faster than the one for finding this fit, and can be used to check whether the given data set qualifies for linear regression in the first place.

  20. Modeling thermal sensation in a Mediterranean climate—a comparison of linear and ordinal models

    NASA Astrophysics Data System (ADS)

    Pantavou, Katerina; Lykoudis, Spyridon

    2014-08-01

    A simple thermo-physiological model of outdoor thermal sensation adjusted with psychological factors is developed aiming to predict thermal sensation in Mediterranean climates. Microclimatic measurements simultaneously with interviews on personal and psychological conditions were carried out in a square, a street canyon and a coastal location of the greater urban area of Athens, Greece. Multiple linear and ordinal regression were applied in order to estimate thermal sensation making allowance for all the recorded parameters or specific, empirically selected, subsets producing so-called extensive and empirical models, respectively. Meteorological, thermo-physiological and overall models - considering psychological factors as well - were developed. Predictions were improved when personal and psychological factors were taken into account as compared to meteorological models. The model based on ordinal regression reproduced extreme values of thermal sensation vote more adequately than the linear regression one, while the empirical model produced satisfactory results in relation to the extensive model. The effects of adaptation and expectation on thermal sensation vote were introduced in the models by means of the exposure time, season and preference related to air temperature and irradiation. The assessment of thermal sensation could be a useful criterion in decision making regarding public health, outdoor spaces planning and tourism.

  1. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing

    PubMed Central

    Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

    2018-01-01

    Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393

  2. Aspects of porosity prediction using multivariate linear regression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Byrnes, A.P.; Wilson, M.D.

    1991-03-01

    Highly accurate multiple linear regression models have been developed for sandstones of diverse compositions. Porosity reduction or enhancement processes are controlled by the fundamental variables, Pressure (P), Temperature (T), Time (t), and Composition (X), where composition includes mineralogy, size, sorting, fluid composition, etc. The multiple linear regression equation, of which all linear porosity prediction models are subsets, takes the generalized form: Porosity = C{sub 0} + C{sub 1}(P) + C{sub 2}(T) + C{sub 3}(X) + C{sub 4}(t) + C{sub 5}(PT) + C{sub 6}(PX) + C{sub 7}(Pt) + C{sub 8}(TX) + C{sub 9}(Tt) + C{sub 10}(Xt) + C{sub 11}(PTX) + C{submore » 12}(PXt) + C{sub 13}(PTt) + C{sub 14}(TXt) + C{sub 15}(PTXt). The first four primary variables are often interactive, thus requiring terms involving two or more primary variables (the form shown implies interaction and not necessarily multiplication). The final terms used may also involve simple mathematic transforms such as log X, e{sup T}, X{sup 2}, or more complex transformations such as the Time-Temperature Index (TTI). The X term in the equation above represents a suite of compositional variable and, therefore, a fully expanded equation may include a series of terms incorporating these variables. Numerous published bivariate porosity prediction models involving P (or depth) or Tt (TTI) are effective to a degree, largely because of the high degree of colinearity between p and TTI. However, all such bivariate models ignore the unique contributions of P and Tt, as well as various X terms. These simpler models become poor predictors in regions where colinear relations change, were important variables have been ignored, or where the database does not include a sufficient range or weight distribution for the critical variables.« less

  3. Considerations for analysis of time-to-event outcomes measured with error: Bias and correction with SIMEX.

    PubMed

    Oh, Eric J; Shepherd, Bryan E; Lumley, Thomas; Shaw, Pamela A

    2018-04-15

    For time-to-event outcomes, a rich literature exists on the bias introduced by covariate measurement error in regression models, such as the Cox model, and methods of analysis to address this bias. By comparison, less attention has been given to understanding the impact or addressing errors in the failure time outcome. For many diseases, the timing of an event of interest (such as progression-free survival or time to AIDS progression) can be difficult to assess or reliant on self-report and therefore prone to measurement error. For linear models, it is well known that random errors in the outcome variable do not bias regression estimates. With nonlinear models, however, even random error or misclassification can introduce bias into estimated parameters. We compare the performance of 2 common regression models, the Cox and Weibull models, in the setting of measurement error in the failure time outcome. We introduce an extension of the SIMEX method to correct for bias in hazard ratio estimates from the Cox model and discuss other analysis options to address measurement error in the response. A formula to estimate the bias induced into the hazard ratio by classical measurement error in the event time for a log-linear survival model is presented. Detailed numerical studies are presented to examine the performance of the proposed SIMEX method under varying levels and parametric forms of the error in the outcome. We further illustrate the method with observational data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic. Copyright © 2017 John Wiley & Sons, Ltd.

  4. A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION

    EPA Science Inventory

    We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...

  5. Do clinical and translational science graduate students understand linear regression? Development and early validation of the REGRESS quiz.

    PubMed

    Enders, Felicity

    2013-12-01

    Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students' understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health. A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field. Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P < 0.001). Postcourse students had similar results to practicing statisticians (mean (SD) of 20.1(3.5); P = 0.21). Students also showed significant improvement pre/postcourse in each of six domain areas (P < 0.001). The REGRESS quiz was internally reliable (Cronbach's alpha 0.89). The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions. © 2013 Wiley Periodicals, Inc.

  6. Functional linear models to test for differences in prairie wetland hydraulic gradients

    USGS Publications Warehouse

    Greenwood, Mark C.; Sojda, Richard S.; Preston, Todd M.; Swayne, David A.; Yang, Wanhong; Voinov, A.A.; Rizzoli, A.; Filatova, T.

    2010-01-01

    Functional data analysis provides a framework for analyzing multiple time series measured frequently in time, treating each series as a continuous function of time. Functional linear models are used to test for effects on hydraulic gradient functional responses collected from three types of land use in Northeastern Montana at fourteen locations. Penalized regression-splines are used to estimate the underlying continuous functions based on the discretely recorded (over time) gradient measurements. Permutation methods are used to assess the statistical significance of effects. A method for accommodating missing observations in each time series is described. Hydraulic gradients may be an initial and fundamental ecosystem process that responds to climate change. We suggest other potential uses of these methods for detecting evidence of climate change.

  7. Influence of landscape-scale factors in limiting brook trout populations in Pennsylvania streams

    USGS Publications Warehouse

    Kocovsky, P.M.; Carline, R.F.

    2006-01-01

    Landscapes influence the capacity of streams to produce trout through their effect on water chemistry and other factors at the reach scale. Trout abundance also fluctuates over time; thus, to thoroughly understand how spatial factors at landscape scales affect trout populations, one must assess the changes in populations over time to provide a context for interpreting the importance of spatial factors. We used data from the Pennsylvania Fish and Boat Commission's fisheries management database to investigate spatial factors that affect the capacity of streams to support brook trout Salvelinus fontinalis and to provide models useful for their management. We assessed the relative importance of spatial and temporal variation by calculating variance components and comparing relative standard errors for spatial and temporal variation. We used binary logistic regression to predict the presence of harvestable-length brook trout and multiple linear regression to assess the mechanistic links between landscapes and trout populations and to predict population density. The variance in trout density among streams was equal to or greater than the temporal variation for several streams, indicating that differences among sites affect population density. Logistic regression models correctly predicted the absence of harvestable-length brook trout in 60% of validation samples. The r 2-value for the linear regression model predicting density was 0.3, indicating low predictive ability. Both logistic and linear regression models supported buffering capacity against acid episodes as an important mechanistic link between landscapes and trout populations. Although our models fail to predict trout densities precisely, their success at elucidating the mechanistic links between landscapes and trout populations, in concert with the importance of spatial variation, increases our understanding of factors affecting brook trout abundance and will help managers and private groups to protect and enhance populations of wild brook trout. ?? Copyright by the American Fisheries Society 2006.

  8. Forecasting Container Throughput at the Doraleh Port in Djibouti through Time Series Analysis

    NASA Astrophysics Data System (ADS)

    Mohamed Ismael, Hawa; Vandyck, George Kobina

    The Doraleh Container Terminal (DCT) located in Djibouti has been noted as the most technologically advanced container terminal on the African continent. DCT's strategic location at the crossroads of the main shipping lanes connecting Asia, Africa and Europe put it in a unique position to provide important shipping services to vessels plying that route. This paper aims to forecast container throughput through the Doraleh Container Port in Djibouti by Time Series Analysis. A selection of univariate forecasting models has been used, namely Triple Exponential Smoothing Model, Grey Model and Linear Regression Model. By utilizing the above three models and their combination, the forecast of container throughput through the Doraleh port was realized. A comparison of the different forecasting results of the three models, in addition to the combination forecast is then undertaken, based on commonly used evaluation criteria Mean Absolute Deviation (MAD) and Mean Absolute Percentage Error (MAPE). The study found that the Linear Regression forecasting Model was the best prediction method for forecasting the container throughput, since its forecast error was the least. Based on the regression model, a ten (10) year forecast for container throughput at DCT has been made.

  9. Pseudo second order kinetics and pseudo isotherms for malachite green onto activated carbon: comparison of linear and non-linear regression methods.

    PubMed

    Kumar, K Vasanth; Sivanesan, S

    2006-08-25

    Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.

  10. Instructional Advice, Time Advice and Learning Questions in Computer Simulations

    ERIC Educational Resources Information Center

    Rey, Gunter Daniel

    2010-01-01

    Undergraduate students (N = 97) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without instructional advice) x 2 (with or without time advice) x 2…

  11. Differences in Student Evaluations of Limited-Term Lecturers and Full-Time Faculty

    ERIC Educational Resources Information Center

    Cho, Jeong-Il; Otani, Koichiro; Kim, B. Joon

    2014-01-01

    This study compared student evaluations of teaching (SET) for limited-term lecturers (LTLs) and full-time faculty (FTF) using a Likert-scaled survey administered to students (N = 1,410) at the end of university courses. Data were analyzed using a general linear regression model to investigate the influence of multi-dimensional evaluation items on…

  12. A comparison of radiometric correction techniques in the evaluation of the relationship between LST and NDVI in Landsat imagery.

    PubMed

    Tan, Kok Chooi; Lim, Hwee San; Matjafri, Mohd Zubir; Abdullah, Khiruddin

    2012-06-01

    Atmospheric corrections for multi-temporal optical satellite images are necessary, especially in change detection analyses, such as normalized difference vegetation index (NDVI) rationing. Abrupt change detection analysis using remote-sensing techniques requires radiometric congruity and atmospheric correction to monitor terrestrial surfaces over time. Two atmospheric correction methods were used for this study: relative radiometric normalization and the simplified method for atmospheric correction (SMAC) in the solar spectrum. A multi-temporal data set consisting of two sets of Landsat images from the period between 1991 and 2002 of Penang Island, Malaysia, was used to compare NDVI maps, which were generated using the proposed atmospheric correction methods. Land surface temperature (LST) was retrieved using ATCOR3_T in PCI Geomatica 10.1 image processing software. Linear regression analysis was utilized to analyze the relationship between NDVI and LST. This study reveals that both of the proposed atmospheric correction methods yielded high accuracy through examination of the linear correlation coefficients. To check for the accuracy of the equation obtained through linear regression analysis for every single satellite image, 20 points were randomly chosen. The results showed that the SMAC method yielded a constant value (in terms of error) to predict the NDVI value from linear regression analysis-derived equation. The errors (average) from both proposed atmospheric correction methods were less than 10%.

  13. Commuting and Sleep: Results From the Hispanic Community Health Study/Study of Latinos Sueño Ancillary Study.

    PubMed

    Petrov, Megan E; Weng, Jia; Reid, Kathryn J; Wang, Rui; Ramos, Alberto R; Wallace, Douglas M; Alcantara, Carmela; Cai, Jianwen; Perreira, Krista; Espinoza Giacinto, Rebeca A; Zee, Phyllis C; Sotres-Alvarez, Daniela; Patel, Sanjay R

    2018-03-01

    Commute time is associated with reduced sleep time, but previous studies have relied on self-reported sleep assessment. The present study investigated the relationships between commute time for employment and objective sleep patterns among non-shift working U.S. Hispanic/Latino adults. From 2010 to 2013, Hispanic/Latino employed, non-shift-working adults (n=760, aged 18-64 years) from the Sueño study, ancillary to the Hispanic Community Health Study/Study of Latinos, reported their total daily commute time to and from work, completed questionnaires on sleep and other health behaviors, and wore wrist actigraphs to record sleep duration, continuity, and variability for 1 week. Survey linear regression models of the actigraphic and self-reported sleep measures regressed on categorized commute time (short: 1-44 minutes; moderate: 45-89 minutes; long: ≥90 minutes) were built adjusting for relevant covariates. For associations that suggested a linear relationship, continuous commute time was modeled as the exposure. Moderation effects by age, sex, income, and depressive symptoms also were explored. Commute time was linearly related to sleep duration on work days such that each additional hour of commute time conferred 15 minutes of sleep loss (p=0.01). Compared with short commutes, individuals with moderate commutes had greater sleep duration variability (p=0.04) and lower interdaily stability (p=0.046, a measure of sleep/wake schedule regularity). No significant associations were detected for self-reported sleep measures. Commute time is significantly associated with actigraphy-measured sleep duration and regularity among Hispanic/Latino adults. Interventions to shorten commute times should be evaluated to help improve sleep habits in this minority population. Copyright © 2017 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  14. Isovolumic relaxation period as an index of left ventricular relaxation under different afterload conditions--comparison with the time constant of left ventricular pressure decay in the dog.

    PubMed

    Ochi, H; Ikuma, I; Toda, H; Shimada, T; Morioka, S; Moriyama, K

    1989-12-01

    In order to determine whether isovolumic relaxation period (IRP) reflects left ventricular relaxation under different afterload conditions, 17 anesthetized, open chest dogs were studied, and the left ventricular pressure decay time constant (T) was calculated. In 12 dogs, angiotensin II and nitroprusside were administered, with the heart rate constant at 90 beats/min. Multiple linear regression analysis showed that the aortic dicrotic notch pressure (AoDNP) and T were major determinants of IRP, while left ventricular end-diastolic pressure was a minor determinant. Multiple linear regression analysis, correlating T with IRP and AoDNP, did not further improve the correlation coefficient compared with that between T and IRP. We concluded that correction of the IRP by AoDNP is not necessary to predict T from additional multiple linear regression. The effects of ascending aortic constriction or angiotensin II on IRP were examined in five dogs, after pretreatment with propranolol. Aortic constriction caused a significant decrease in IRP and T, while angiotensin II produced a significant increase in IRP and T. IRP was affected by the change of afterload. However, the IRP and T values were always altered in the same direction. These results demonstrate that IRP is substituted for T and it reflects left ventricular relaxation even in different afterload conditions. We conclude that IRP is a simple parameter easily used to evaluate left ventricular relaxation in clinical situations.

  15. Survival Data and Regression Models

    NASA Astrophysics Data System (ADS)

    Grégoire, G.

    2014-12-01

    We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.

  16. ℓ(p)-Norm multikernel learning approach for stock market price forecasting.

    PubMed

    Shao, Xigao; Wu, Kun; Liao, Bifeng

    2012-01-01

    Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.

  17. Impact of a New Law to Reduce the Legal Blood Alcohol Concentration Limit - A Poisson Regression Analysis and Descriptive Approach.

    PubMed

    Nistal-Nuño, Beatriz

    2017-03-31

    In Chile, a new law introduced in March 2012 lowered the blood alcohol concentration (BAC) limit for impaired drivers from 0.1% to 0.08% and the BAC limit for driving under the influence of alcohol from 0.05% to 0.03%, but its effectiveness remains uncertain. The goal of this investigation was to evaluate the effects of this enactment on road traffic injuries and fatalities in Chile. A retrospective cohort study. Data were analyzed using a descriptive and a Generalized Linear Models approach, type of Poisson regression, to analyze deaths and injuries in a series of additive Log-Linear Models accounting for the effects of law implementation, month influence, a linear time trend and population exposure. A review of national databases in Chile was conducted from 2003 to 2014 to evaluate the monthly rates of traffic fatalities and injuries associated to alcohol and in total. It was observed a decrease by 28.1 percent in the monthly rate of traffic fatalities related to alcohol as compared to before the law (P<0.001). Adding a linear time trend as a predictor, the decrease was by 20.9 percent (P<0.001).There was a reduction in the monthly rate of traffic injuries related to alcohol by 10.5 percent as compared to before the law (P<0.001). Adding a linear time trend as a predictor, the decrease was by 24.8 percent (P<0.001). Positive results followed from this new 'zero-tolerance' law implemented in 2012 in Chile. Chile experienced a significant reduction in alcohol-related traffic fatalities and injuries, being a successful public health intervention.

  18. Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

    DTIC Science & Technology

    2015-07-15

    Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

  19. Prediction of the Main Engine Power of a New Container Ship at the Preliminary Design Stage

    NASA Astrophysics Data System (ADS)

    Cepowski, Tomasz

    2017-06-01

    The paper presents mathematical relationships that allow us to forecast the estimated main engine power of new container ships, based on data concerning vessels built in 2005-2015. The presented approximations allow us to estimate the engine power based on the length between perpendiculars and the number of containers the ship will carry. The approximations were developed using simple linear regression and multivariate linear regression analysis. The presented relations have practical application for estimation of container ship engine power needed in preliminary parametric design of the ship. It follows from the above that the use of multiple linear regression to predict the main engine power of a container ship brings more accurate solutions than simple linear regression.

  20. INNOVATIVE INSTRUMENTATION AND ANALYSIS OF THE TEMPERATURE MEASUREMENT FOR HIGH TEMPERATURE GASIFICATION

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seong W. Lee

    During this reporting period, the literature survey including the gasifier temperature measurement literature, the ultrasonic application and its background study in cleaning application, and spray coating process are completed. The gasifier simulator (cold model) testing has been successfully conducted. Four factors (blower voltage, ultrasonic application, injection time intervals, particle weight) were considered as significant factors that affect the temperature measurement. The Analysis of Variance (ANOVA) was applied to analyze the test data. The analysis shows that all four factors are significant to the temperature measurements in the gasifier simulator (cold model). The regression analysis for the case with the normalizedmore » room temperature shows that linear model fits the temperature data with 82% accuracy (18% error). The regression analysis for the case without the normalized room temperature shows 72.5% accuracy (27.5% error). The nonlinear regression analysis indicates a better fit than that of the linear regression. The nonlinear regression model's accuracy is 88.7% (11.3% error) for normalized room temperature case, which is better than the linear regression analysis. The hot model thermocouple sleeve design and fabrication are completed. The gasifier simulator (hot model) design and the fabrication are completed. The system tests of the gasifier simulator (hot model) have been conducted and some modifications have been made. Based on the system tests and results analysis, the gasifier simulator (hot model) has met the proposed design requirement and the ready for system test. The ultrasonic cleaning method is under evaluation and will be further studied for the gasifier simulator (hot model) application. The progress of this project has been on schedule.« less

  1. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    PubMed

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  2. Estimation of Standard Error of Regression Effects in Latent Regression Models Using Binder's Linearization. Research Report. ETS RR-07-09

    ERIC Educational Resources Information Center

    Li, Deping; Oranje, Andreas

    2007-01-01

    Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…

  3. Regression assumptions in clinical psychology research practice-a systematic review of common misconceptions.

    PubMed

    Ernst, Anja F; Albers, Casper J

    2017-01-01

    Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking.

  4. Regression assumptions in clinical psychology research practice—a systematic review of common misconceptions

    PubMed Central

    Ernst, Anja F.

    2017-01-01

    Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking. PMID:28533971

  5. Comparing The Effectiveness of a90/95 Calculations (Preprint)

    DTIC Science & Technology

    2006-09-01

    Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity

  6. Trends in Timing of Pregnancy Awareness Among US Women.

    PubMed

    Branum, Amy M; Ahrens, Katherine A

    2017-04-01

    Objectives Early pregnancy detection is important for improving pregnancy outcomes as the first trimester is a critical window of fetal development; however, there has been no description of trends in timing of pregnancy awareness among US women. Methods We examined data from the 1995, 2002, 2006-2010 and 2011-2013 National Survey of Family Growth on self-reported timing of pregnancy awareness among women aged 15-44 years who reported at least one pregnancy in the 4 or 5 years prior to interview that did not result in induced abortion or adoption (n = 17, 406). We examined the associations between maternal characteristics and late pregnancy awareness (≥7 weeks' gestation) using adjusted prevalence ratios from logistic regression models. Gestational age at time of pregnancy awareness (continuous) was regressed over year of pregnancy conception (1990-2012) in a linear model. Results Among all pregnancies reported, gestational age at time of pregnancy awareness was 5.5 weeks (standard error = 0.04) and the prevalence of late pregnancy awareness was 23 % (standard error = 1 %). Late pregnancy awareness decreased with maternal age, was more prevalent among non-Hispanic black and Hispanic women compared to non-Hispanic white women, and for unintended pregnancies versus those that were intended (p < 0.01). Mean time of pregnancy awareness did not change linearly over a 23-year time period after adjustment for maternal age at the time of conception (p < 0.16). Conclusions for Practice On average, timing of pregnancy awareness did not change linearly during 1990-2012 among US women and occurs later among certain groups of women who are at higher risk of adverse birth outcomes.

  7. Trends in Timing of Pregnancy Awareness Among US Women

    PubMed Central

    2017-01-01

    Objectives Early pregnancy detection is important for improving pregnancy outcomes as the first trimester is a critical window of fetal development; however, there has been no description of trends in timing of pregnancy awareness among US women. Methods We examined data from the 1995, 2002, 2006–2010 and 2011–2013 National Survey of Family Growth on self-reported timing of pregnancy awareness among women aged 15–44 years who reported at least one pregnancy in the 4 or 5 years prior to interview that did not result in induced abortion or adoption (n = 17, 406). We examined the associations between maternal characteristics and late pregnancy awareness (≥7 weeks’ gestation) using adjusted prevalence ratios from logistic regression models. Gestational age at time of pregnancy awareness (continuous) was regressed over year of pregnancy conception (1990–2012) in a linear model. Results Among all pregnancies reported, gestational age at time of pregnancy awareness was 5.5 weeks (standard error = 0.04) and the prevalence of late pregnancy awareness was 23 % (standard error = 1 %). Late pregnancy awareness decreased with maternal age, was more prevalent among non-Hispanic black and Hispanic women compared to non-Hispanic white women, and for unintended pregnancies versus those that were intended (p < 0.01). Mean time of pregnancy awareness did not change linearly over a 23-year time period after adjustment for maternal age at the time of conception (p < 0.16). Conclusions for Practice On average, timing of pregnancy awareness did not change linearly during 1990–2012 among US women and occurs later among certain groups of women who are at higher risk of adverse birth outcomes. PMID:27449777

  8. The increase in symptoms of anxiety and depressed mood among Icelandic adolescents: time trend between 2006 and 2016.

    PubMed

    Thorisdottir, Ingibjorg E; Asgeirsdottir, Bryndis B; Sigurvinsdottir, Rannveig; Allegrante, John P; Sigfusdottir, Inga D

    2017-10-01

    Both research and popular media reports suggest that adolescent mental health has been deteriorating across societies with advanced economies. This study sought to describe the trends in self-reported symptoms of depressed mood and anxiety among Icelandic adolescents. Data for this study come from repeated, cross-sectional, population-based school surveys of 43 482 Icelandic adolescents in 9th and 10th grade, with six waves of pooled data from 2006 to 2016. We used analysis of variance, linear regression and binomial logistic regression to examine trends in symptom scores of anxiety and depressed mood over time. Gender differences in trends of high symptoms were also tested for interactions. Linear regression analysis showed a significant linear increase over the course of the study period in mean symptoms of anxiety and depressed mood for girls only; however, symptoms of anxiety among boys decreased. The proportion of adolescents reporting high depressive symptoms increased by 1.6% for boys and 6.8% for girls; the proportion of those reporting high anxiety symptoms increased by 1.3% for boys and 8.6% for girls. Over the study period, the odds for reporting high depressive symptoms and high anxiety symptoms were significantly higher for both genders. Girls were more likely to report high symptoms of anxiety and depressed mood than boys. Self-reported symptoms of anxiety and depressed mood have increased over time among Icelandic adolescents. Our findings suggest that future research needs to look beyond mean changes and examine the trends among those adolescents who report high symptoms of emotional distress. © The Author 2017. Published by Oxford University Press on behalf of the European Public Health Association. All rights reserved.

  9. Correlation and simple linear regression.

    PubMed

    Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

    2003-06-01

    In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.

  10. Regression to fuzziness method for estimation of remaining useful life in power plant components

    NASA Astrophysics Data System (ADS)

    Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.

    2014-10-01

    Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.

  11. Penalized nonparametric scalar-on-function regression via principal coordinates

    PubMed Central

    Reiss, Philip T.; Miller, David L.; Wu, Pei-Shien; Hua, Wen-Yu

    2016-01-01

    A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. In the proposed method, which we call principal coordinate ridge regression, one regresses the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, principal coordinate ridge regression, with dynamic time warping distance used to define the principal coordinates, is shown to outperform a functional generalized linear model. PMID:29217963

  12. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing.

    PubMed

    Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

    2018-02-01

    A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  13. New Approach To Hour-By-Hour Weather Forecast

    NASA Astrophysics Data System (ADS)

    Liao, Q. Q.; Wang, B.

    2017-12-01

    Fine hourly forecast in single station weather forecast is required in many human production and life application situations. Most previous MOS (Model Output Statistics) which used a linear regression model are hard to solve nonlinear natures of the weather prediction and forecast accuracy has not been sufficient at high temporal resolution. This study is to predict the future meteorological elements including temperature, precipitation, relative humidity and wind speed in a local region over a relatively short period of time at hourly level. By means of hour-to-hour NWP (Numeral Weather Prediction)meteorological field from Forcastio (https://darksky.net/dev/docs/forecast) and real-time instrumental observation including 29 stations in Yunnan and 3 stations in Tianjin of China from June to October 2016, predictions are made of the 24-hour hour-by-hour ahead. This study presents an ensemble approach to combine the information of instrumental observation itself and NWP. Use autoregressive-moving-average (ARMA) model to predict future values of the observation time series. Put newest NWP products into the equations derived from the multiple linear regression MOS technique. Handle residual series of MOS outputs with autoregressive (AR) model for the linear property presented in time series. Due to the complexity of non-linear property of atmospheric flow, support vector machine (SVM) is also introduced . Therefore basic data quality control and cross validation makes it able to optimize the model function parameters , and do 24 hours ahead residual reduction with AR/SVM model. Results show that AR model technique is better than corresponding multi-variant MOS regression method especially at the early 4 hours when the predictor is temperature. MOS-AR combined model which is comparable to MOS-SVM model outperform than MOS. Both of their root mean square error and correlation coefficients for 2 m temperature are reduced to 1.6 degree Celsius and 0.91 respectively. The forecast accuracy of 24- hour forecast deviation no more than 2 degree Celsius is 78.75 % for MOS-AR model and 81.23 % for AR model.

  14. Healthcare Expenditures Associated with Depression Among Individuals with Osteoarthritis: Post-Regression Linear Decomposition Approach.

    PubMed

    Agarwal, Parul; Sambamoorthi, Usha

    2015-12-01

    Depression is common among individuals with osteoarthritis and leads to increased healthcare burden. The objective of this study was to examine excess total healthcare expenditures associated with depression among individuals with osteoarthritis in the US. Adults with self-reported osteoarthritis (n = 1881) were identified using data from the 2010 Medical Expenditure Panel Survey (MEPS). Among those with osteoarthritis, chi-square tests and ordinary least square regressions (OLS) were used to examine differences in healthcare expenditures between those with and without depression. Post-regression linear decomposition technique was used to estimate the relative contribution of different constructs of the Anderson's behavioral model, i.e., predisposing, enabling, need, personal healthcare practices, and external environment factors, to the excess expenditures associated with depression among individuals with osteoarthritis. All analysis accounted for the complex survey design of MEPS. Depression coexisted among 20.6 % of adults with osteoarthritis. The average total healthcare expenditures were $13,684 among adults with depression compared to $9284 among those without depression. Multivariable OLS regression revealed that adults with depression had 38.8 % higher healthcare expenditures (p < 0.001) compared to those without depression. Post-regression linear decomposition analysis indicated that 50 % of differences in expenditures among adults with and without depression can be explained by differences in need factors. Among individuals with coexisting osteoarthritis and depression, excess healthcare expenditures associated with depression were mainly due to comorbid anxiety, chronic conditions and poor health status. These expenditures may potentially be reduced by providing timely intervention for need factors or by providing care under a collaborative care model.

  15. U.S. Army Armament Research, Development and Engineering Center Grain Evaluation Software to Numerically Predict Linear Burn Regression for Solid Propellant Grain Geometries

    DTIC Science & Technology

    2017-10-01

    ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID PROPELLANT GRAIN GEOMETRIES Brian...author(s) and should not be construed as an official Department of the Army position, policy, or decision, unless so designated by other documentation...U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID

  16. Linear regression in astronomy. II

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  17. A Constrained Linear Estimator for Multiple Regression

    ERIC Educational Resources Information Center

    Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.

    2010-01-01

    "Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…

  18. Runoff load estimation of particulate and dissolved nitrogen in Lake Inba watershed using continuous monitoring data on turbidity and electric conductivity.

    PubMed

    Kim, J; Nagano, Y; Furumai, H

    2012-01-01

    Easy-to-measure surrogate parameters for water quality indicators are needed for real time monitoring as well as for generating data for model calibration and validation. In this study, a novel linear regression model for estimating total nitrogen (TN) based on two surrogate parameters is proposed based on evaluation of pollutant loads flowing into a eutrophic lake. Based on their runoff characteristics during wet weather, electric conductivity (EC) and turbidity were selected as surrogates for particulate nitrogen (PN) and dissolved nitrogen (DN), respectively. Strong linear relationships were established between PN and turbidity and DN and EC, and both models subsequently combined for estimation of TN. This model was evaluated by comparison of estimated and observed TN runoff loads during rainfall events. This analysis showed that turbidity and EC are viable surrogates for PN and DN, respectively, and that the linear regression model for TN concentration was successful in estimating TN runoff loads during rainfall events and also under dry weather conditions.

  19. Multivariate meta-analysis for non-linear and other multi-parameter associations

    PubMed Central

    Gasparrini, A; Armstrong, B; Kenward, M G

    2012-01-01

    In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043

  20. Linear-regression convolutional neural network for fully automated coronary lumen segmentation in intravascular optical coherence tomography.

    PubMed

    Yong, Yan Ling; Tan, Li Kuo; McLaughlin, Robert A; Chee, Kok Han; Liew, Yih Miin

    2017-12-01

    Intravascular optical coherence tomography (OCT) is an optical imaging modality commonly used in the assessment of coronary artery diseases during percutaneous coronary intervention. Manual segmentation to assess luminal stenosis from OCT pullback scans is challenging and time consuming. We propose a linear-regression convolutional neural network to automatically perform vessel lumen segmentation, parameterized in terms of radial distances from the catheter centroid in polar space. Benchmarked against gold-standard manual segmentation, our proposed algorithm achieves average locational accuracy of the vessel wall of 22 microns, and 0.985 and 0.970 in Dice coefficient and Jaccard similarity index, respectively. The average absolute error of luminal area estimation is 1.38%. The processing rate is 40.6 ms per image, suggesting the potential to be incorporated into a clinical workflow and to provide quantitative assessment of vessel lumen in an intraoperative time frame. (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).

  1. On the design of classifiers for crop inventories

    NASA Technical Reports Server (NTRS)

    Heydorn, R. P.; Takacs, H. C.

    1986-01-01

    Crop proportion estimators that use classifications of satellite data to correct, in an additive way, a given estimate acquired from ground observations are discussed. A linear version of these estimators is optimal, in terms of minimum variance, when the regression of the ground observations onto the satellite observations in linear. When this regression is not linear, but the reverse regression (satellite observations onto ground observations) is linear, the estimator is suboptimal but still has certain appealing variance properties. In this paper expressions are derived for those regressions which relate the intercepts and slopes to conditional classification probabilities. These expressions are then used to discuss the question of classifier designs that can lead to low-variance crop proportion estimates. Variance expressions for these estimates in terms of classifier omission and commission errors are also derived.

  2. Bayesian Travel Time Inversion adopting Gaussian Process Regression

    NASA Astrophysics Data System (ADS)

    Mauerberger, S.; Holschneider, M.

    2017-12-01

    A major application in seismology is the determination of seismic velocity models. Travel time measurements are putting an integral constraint on the velocity between source and receiver. We provide insight into travel time inversion from a correlation-based Bayesian point of view. Therefore, the concept of Gaussian process regression is adopted to estimate a velocity model. The non-linear travel time integral is approximated by a 1st order Taylor expansion. A heuristic covariance describes correlations amongst observations and a priori model. That approach enables us to assess a proxy of the Bayesian posterior distribution at ordinary computational costs. No multi dimensional numeric integration nor excessive sampling is necessary. Instead of stacking the data, we suggest to progressively build the posterior distribution. Incorporating only a single evidence at a time accounts for the deficit of linearization. As a result, the most probable model is given by the posterior mean whereas uncertainties are described by the posterior covariance.As a proof of concept, a synthetic purely 1d model is addressed. Therefore a single source accompanied by multiple receivers is considered on top of a model comprising a discontinuity. We consider travel times of both phases - direct and reflected wave - corrupted by noise. Left and right of the interface are assumed independent where the squared exponential kernel serves as covariance.

  3. Placement Model for First-Time Freshmen in Calculus I (Math 131): University of Northern Colorado

    ERIC Educational Resources Information Center

    Heiny, Robert L.; Heiny, Erik L.; Raymond, Karen

    2017-01-01

    Two approaches, Linear Discriminant Analysis, and Logistic Regression are used and compared to predict success or failure for first-time freshmen in the first calculus course at a medium-sized public, 4-year institution prior to Fall registration. The predictor variables are high school GPA, the number, and GPA's of college prep mathematics…

  4. Assessing the potential for improving S2S forecast skill through multimodel ensembling

    NASA Astrophysics Data System (ADS)

    Vigaud, N.; Robertson, A. W.; Tippett, M. K.; Wang, L.; Bell, M. J.

    2016-12-01

    Non-linear logistic regression is well suited to probability forecasting and has been successfully applied in the past to ensemble weather and climate predictions, providing access to the full probabilities distribution without any Gaussian assumption. However, little work has been done at sub-monthly lead times where relatively small re-forecast ensembles and lengths represent new challenges for which post-processing avenues have yet to be investigated. A promising approach consists in extending the definition of non-linear logistic regression by including the quantile of the forecast distribution as one of the predictors. So-called Extended Logistic Regression (ELR), which enables mutually consistent individual threshold probabilities, is here applied to ECMWF, CFSv2 and CMA re-forecasts from the S2S database in order to produce rainfall probabilities at weekly resolution. The ELR model is trained on seasonally-varying tercile categories computed for lead times of 1 to 4 weeks. It is then tested in a cross-validated manner, i.e. allowing real-time predictability applications, to produce rainfall tercile probabilities from individual weekly hindcasts that are finally combined by equal pooling. Results will be discussed over a broader North American region, where individual and MME forecasts generated out to 4 weeks lead are characterized by good probabilistic reliability but low sharpness, exhibiting systematically more skill in winter than summer.

  5. An Experimental Study in Determining Energy Expenditure from Treadmill Walking using Hip-Worn Inertial Sensors

    PubMed Central

    Vathsangam, Harshvardhan; Emken, Adar; Schroeder, E. Todd; Spruijt-Metz, Donna; Sukhatme, Gaurav S.

    2011-01-01

    This paper describes an experimental study in estimating energy expenditure from treadmill walking using a single hip-mounted triaxial inertial sensor comprised of a triaxial accelerometer and a triaxial gyroscope. Typical physical activity characterization using accelerometer generated counts suffers from two drawbacks - imprecison (due to proprietary counts) and incompleteness (due to incomplete movement description). We address these problems in the context of steady state walking by directly estimating energy expenditure with data from a hip-mounted inertial sensor. We represent the cyclic nature of walking with a Fourier transform of sensor streams and show how one can map this representation to energy expenditure (as measured by V O2 consumption, mL/min) using three regression techniques - Least Squares Regression (LSR), Bayesian Linear Regression (BLR) and Gaussian Process Regression (GPR). We perform a comparative analysis of the accuracy of sensor streams in predicting energy expenditure (measured by RMS prediction accuracy). Triaxial information is more accurate than uniaxial information. LSR based approaches are prone to outlier sensitivity and overfitting. Gyroscopic information showed equivalent if not better prediction accuracy as compared to accelerometers. Combining accelerometer and gyroscopic information provided better accuracy than using either sensor alone. We also analyze the best algorithmic approach among linear and nonlinear methods as measured by RMS prediction accuracy and run time. Nonlinear regression methods showed better prediction accuracy but required an order of magnitude of run time. This paper emphasizes the role of probabilistic techniques in conjunction with joint modeling of triaxial accelerations and rotational rates to improve energy expenditure prediction for steady-state treadmill walking. PMID:21690001

  6. pLARmEB: integration of least angle regression with empirical Bayes for multilocus genome-wide association studies.

    PubMed

    Zhang, J; Feng, J-Y; Ni, Y-L; Wen, Y-J; Niu, Y; Tamba, C L; Yue, C; Song, Q; Zhang, Y-M

    2017-06-01

    Multilocus genome-wide association studies (GWAS) have become the state-of-the-art procedure to identify quantitative trait nucleotides (QTNs) associated with complex traits. However, implementation of multilocus model in GWAS is still difficult. In this study, we integrated least angle regression with empirical Bayes to perform multilocus GWAS under polygenic background control. We used an algorithm of model transformation that whitened the covariance matrix of the polygenic matrix K and environmental noise. Markers on one chromosome were included simultaneously in a multilocus model and least angle regression was used to select the most potentially associated single-nucleotide polymorphisms (SNPs), whereas the markers on the other chromosomes were used to calculate kinship matrix as polygenic background control. The selected SNPs in multilocus model were further detected for their association with the trait by empirical Bayes and likelihood ratio test. We herein refer to this method as the pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes). Results from simulation studies showed that pLARmEB was more powerful in QTN detection and more accurate in QTN effect estimation, had less false positive rate and required less computing time than Bayesian hierarchical generalized linear model, efficient mixed model association (EMMA) and least angle regression plus empirical Bayes. pLARmEB, multilocus random-SNP-effect mixed linear model and fast multilocus random-SNP-effect EMMA methods had almost equal power of QTN detection in simulation experiments. However, only pLARmEB identified 48 previously reported genes for 7 flowering time-related traits in Arabidopsis thaliana.

  7. Changes in the timing of snowmelt and streamflow in Colorado: A response to recent warming

    USGS Publications Warehouse

    Clow, David W.

    2010-01-01

    Trends in the timing of snowmelt and associated runoff in Colorado were evaluated for the 1978-2007 water years using the regional Kendall test (RKT) on daily snow-water equivalent (SWE) data from snowpack telemetry (SNOTEL) sites and daily streamflow data from headwater streams. The RKT is a robust, nonparametric test that provides an increased power of trend detection by grouping data from multiple sites within a given geographic region. The RKT analyses indicated strong, pervasive trends in snowmelt and streamflow timing, which have shifted toward earlier in the year by a median of 2-3 weeks over the 29-yr study period. In contrast, relatively few statistically significant trends were detected using simple linear regression. RKT analyses also indicated that November-May air temperatures increased by a median of 0.9 degrees C decade-1, while 1 April SWE and maximum SWE declined by a median of 4.1 and 3.6 cm decade-1, respectively. Multiple linear regression models were created, using monthly air temperatures, snowfall, latitude, and elevation as explanatory variables to identify major controlling factors on snowmelt timing. The models accounted for 45% of the variance in snowmelt onset, and 78% of the variance in the snowmelt center of mass (when half the snowpack had melted). Variations in springtime air temperature and SWE explained most of the interannual variability in snowmelt timing. Regression coefficients for air temperature were negative, indicating that warm temperatures promote early melt. Regression coefficients for SWE, latitude, and elevation were positive, indicating that abundant snowfall tends to delay snowmelt, and snowmelt tends to occur later at northern latitudes and high elevations. Results from this study indicate that even the mountains of Colorado, with their high elevations and cold snowpacks, are experiencing substantial shifts in the timing of snowmelt and snowmelt runoff toward earlier in the year.

  8. A new graphic plot analysis for determination of neuroreceptor binding in positron emission tomography studies.

    PubMed

    Ito, Hiroshi; Yokoi, Takashi; Ikoma, Yoko; Shidahara, Miho; Seki, Chie; Naganawa, Mika; Takahashi, Hidehiko; Takano, Harumasa; Kimura, Yuichi; Ichise, Masanori; Suhara, Tetsuya

    2010-01-01

    In positron emission tomography (PET) studies with radioligands for neuroreceptors, tracer kinetics have been described by the standard two-tissue compartment model that includes the compartments of nondisplaceable binding and specific binding to receptors. In the present study, we have developed a new graphic plot analysis to determine the total distribution volume (V(T)) and nondisplaceable distribution volume (V(ND)) independently, and therefore the binding potential (BP(ND)). In this plot, Y(t) is the ratio of brain tissue activity to time-integrated arterial input function, and X(t) is the ratio of time-integrated brain tissue activity to time-integrated arterial input function. The x-intercept of linear regression of the plots for early phase represents V(ND), and the x-intercept of linear regression of the plots for delayed phase after the equilibrium time represents V(T). BP(ND) can be calculated by BP(ND)=V(T)/V(ND)-1. Dynamic PET scanning with measurement of arterial input function was performed on six healthy men after intravenous rapid bolus injection of [(11)C]FLB457. The plot yielded a curve in regions with specific binding while it yielded a straight line through all plot data in regions with no specific binding. V(ND), V(T), and BP(ND) values calculated by the present method were in good agreement with those by conventional non-linear least-squares fitting procedure. This method can be used to distinguish graphically whether the radioligand binding includes specific binding or not.

  9. A comparative study of generalized linear mixed modelling and artificial neural network approach for the joint modelling of survival and incidence of Dengue patients in Sri Lanka

    NASA Astrophysics Data System (ADS)

    Hapugoda, J. C.; Sooriyarachchi, M. R.

    2017-09-01

    Survival time of patients with a disease and the incidence of that particular disease (count) is frequently observed in medical studies with the data of a clustered nature. In many cases, though, the survival times and the count can be correlated in a way that, diseases that occur rarely could have shorter survival times or vice versa. Due to this fact, joint modelling of these two variables will provide interesting and certainly improved results than modelling these separately. Authors have previously proposed a methodology using Generalized Linear Mixed Models (GLMM) by joining the Discrete Time Hazard model with the Poisson Regression model to jointly model survival and count model. As Aritificial Neural Network (ANN) has become a most powerful computational tool to model complex non-linear systems, it was proposed to develop a new joint model of survival and count of Dengue patients of Sri Lanka by using that approach. Thus, the objective of this study is to develop a model using ANN approach and compare the results with the previously developed GLMM model. As the response variables are continuous in nature, Generalized Regression Neural Network (GRNN) approach was adopted to model the data. To compare the model fit, measures such as root mean square error (RMSE), absolute mean error (AME) and correlation coefficient (R) were used. The measures indicate the GRNN model fits the data better than the GLMM model.

  10. Spatially resolved regression analysis of pre-treatment FDG, FLT and Cu-ATSM PET from post-treatment FDG PET: an exploratory study

    PubMed Central

    Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert

    2012-01-01

    Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748

  11. An Analysis of COLA (Cost of Living Adjustment) Allocation within the United States Coast Guard.

    DTIC Science & Technology

    1983-09-01

    books Applied Linear Regression [Ref. 39], and Statistical Methods in Research and Production [Ref. 40], or any other book on regression. In the event...Indexes, Master’s Thesis, Air Force Institute of Technology, Wright-Patterson AFB, 1976. 39. Weisberg, Stanford, Applied Linear Regression , Wiley, 1980. 40

  12. Testing hypotheses for differences between linear regression lines

    Treesearch

    Stanley J. Zarnoch

    2009-01-01

    Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...

  13. Graphical Description of Johnson-Neyman Outcomes for Linear and Quadratic Regression Surfaces.

    ERIC Educational Resources Information Center

    Schafer, William D.; Wang, Yuh-Yin

    A modification of the usual graphical representation of heterogeneous regressions is described that can aid in interpreting significant regions for linear or quadratic surfaces. The standard Johnson-Neyman graph is a bivariate plot with the criterion variable on the ordinate and the predictor variable on the abscissa. Regression surfaces are drawn…

  14. Teaching the Concept of Breakdown Point in Simple Linear Regression.

    ERIC Educational Resources Information Center

    Chan, Wai-Sum

    2001-01-01

    Most introductory textbooks on simple linear regression analysis mention the fact that extreme data points have a great influence on ordinary least-squares regression estimation; however, not many textbooks provide a rigorous mathematical explanation of this phenomenon. Suggests a way to fill this gap by teaching students the concept of breakdown…

  15. Serum 25-hydroxyvitamin D level is associated with myopia in the Korea national health and nutrition examination survey.

    PubMed

    Kwon, Jin-Woo; Choi, Jin A; La, Tae Yoon

    2016-11-01

    The aim of this article was to assess the associations of serum 25-hydroxyvitamin D [25(OH)D] and daily sun exposure time with myopia in Korean adults.This study is based on the Korea National Health and Nutrition Examination Survey (KNHANES) of Korean adults in 2010-2012; multiple logistic regression analyses were performed to examine the associations of serum 25(OH)D levels and daily sun exposure time with myopia, defined as spherical equivalent ≤-0.5D, after adjustment for age, sex, household income, body mass index (BMI), exercise, intraocular pressure (IOP), and education level. Also, multiple linear regression analyses were performed to examine the relationship between serum 25(OH)D levels with spherical equivalent after adjustment for daily sun exposure time in addition to the confounding factors above.Between the nonmyopic and myopic groups, spherical equivalent, age, IOP, BMI, waist circumference, education level, household income, and area of residence differed significantly (all P < 0.05). Compared with subjects with daily sun exposure time <2 hour, subjects with sun exposure time ≥2 to <5 hour, and those with sun exposure time ≥5 hour had significantly less myopia (P < 0.001). In addition, compared with subjects were categorized into quartiles of serum 25(OH)D, the higher quartiles had gradually lower prevalences of myopia after adjustment for confounding factors (P < 0.001). In multiple linear regression analyses, spherical equivalent was significantly associated with serum 25(OH)D concentration after adjustment for confounding factors (P = 0.002).Low serum 25(OH)D levels and shorter daily sun exposure time may be independently associated with a high prevalence of myopia in Korean adults. These data suggest a direct role for vitamin D in the development of myopia.

  16. Locally linear regression for pose-invariant face recognition.

    PubMed

    Chai, Xiujuan; Shan, Shiguang; Chen, Xilin; Gao, Wen

    2007-07-01

    The variation of facial appearance due to the viewpoint (/pose) degrades face recognition systems considerably, which is one of the bottlenecks in face recognition. One of the possible solutions is generating virtual frontal view from any given nonfrontal view to obtain a virtual gallery/probe face. Following this idea, this paper proposes a simple, but efficient, novel locally linear regression (LLR) method, which generates the virtual frontal view from a given nonfrontal face image. We first justify the basic assumption of the paper that there exists an approximate linear mapping between a nonfrontal face image and its frontal counterpart. Then, by formulating the estimation of the linear mapping as a prediction problem, we present the regression-based solution, i.e., globally linear regression. To improve the prediction accuracy in the case of coarse alignment, LLR is further proposed. In LLR, we first perform dense sampling in the nonfrontal face image to obtain many overlapped local patches. Then, the linear regression technique is applied to each small patch for the prediction of its virtual frontal patch. Through the combination of all these patches, the virtual frontal view is generated. The experimental results on the CMU PIE database show distinct advantage of the proposed method over Eigen light-field method.

  17. ℓ p-Norm Multikernel Learning Approach for Stock Market Price Forecasting

    PubMed Central

    Shao, Xigao; Wu, Kun; Liao, Bifeng

    2012-01-01

    Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ 1-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ p-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ 1-norm multiple support vector regression model. PMID:23365561

  18. Browning of the landscape of interior Alaska based on 1986-2009 Landsat sensor NDVI

    Treesearch

    Rebecca A. Baird; David Verbyla; Teresa N. Hollingsworth

    2012-01-01

    We used a time series of 1986-2009 Landsat sensor data to compute the Normalized Difference Vegetation Index (NDVI) for 30 m pixels within the Bonanza Creek Experimental Forest of interior Alaska. Based on simple linear regression, we found significant (p

  19. RESOLUTION OF THE DESTRUCTIVE EFFECT OF NOISE ON LINEAR REGRESSION OF TWO TIME SERIES. (R825260)

    EPA Science Inventory

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...

  20. Early Parallel Activation of Semantics and Phonology in Picture Naming: Evidence from a Multiple Linear Regression MEG Study

    PubMed Central

    Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf

    2015-01-01

    The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200–400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. PMID:25005037

  1. Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.

    PubMed

    Gijsberts, Arjan; Metta, Giorgio

    2013-05-01

    Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.

  2. Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

    NASA Astrophysics Data System (ADS)

    Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

    2008-04-01

    Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.

  3. Effect of Malmquist bias on correlation studies with IRAS data base

    NASA Technical Reports Server (NTRS)

    Verter, Frances

    1993-01-01

    The relationships between galaxy properties in the sample of Trinchieri et al. (1989) are reexamined with corrections for Malmquist bias. The linear correlations are tested and linear regressions are fit for log-log plots of L(FIR), L(H-alpha), and L(B) as well as ratios of these quantities. The linear correlations for Malmquist bias are corrected using the method of Verter (1988), in which each galaxy observation is weighted by the inverse of its sampling volume. The linear regressions are corrected for Malmquist bias by a new method invented here in which each galaxy observation is weighted by its sampling volume. The results of correlation and regressions among the sample are significantly changed in the anticipated sense that the corrected correlation confidences are lower and the corrected slopes of the linear regressions are lower. The elimination of Malmquist bias eliminates the nonlinear rise in luminosity that has caused some authors to hypothesize additional components of FIR emission.

  4. Logic regression and its extensions.

    PubMed

    Schwender, Holger; Ruczinski, Ingo

    2010-01-01

    Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.

  5. A reliable and cost effective approach for radiographic monitoring in nutritional rickets.

    PubMed

    Chatterjee, D; Gupta, V; Sharma, V; Sinha, B; Samanta, S

    2014-04-01

    Radiological scoring is particularly useful in rickets, where pre-treatment radiographical findings can reflect the disease severity and can be used to monitor the improvement. However, there is only a single radiographic scoring system for rickets developed by Thacher and, to the best of our knowledge, no study has evaluated radiographic changes in rickets based on this scoring system apart from the one done by Thacher himself. The main objective of this study is to compare and analyse the pre-treatment and post-treatment radiographic parameters in nutritional rickets with the help of Thacher's scoring technique. 176 patients with nutritional rickets were given a single intramuscular injection of vitamin D (600 000 IU) along with oral calcium (50 mg kg(-1)) and vitamin D (400 IU per day) until radiological resolution and followed for 1 year. Pre- and post-treatment radiological parameters were compared and analysed statistically based on Thacher's scoring system. Radiological resolution was complete by 6 months. Time for radiological resolution and initial radiological score were linearly associated on regression analysis. The distal ulna was the last to heal in most cases except when the initial score was 10, when distal femur was the last to heal. Thacher's scoring system can effectively monitor nutritional rickets. The formula derived through linear regression has prognostic significance. The distal femur is a better indicator in radiologically severe rickets and when resolution is delayed. Thacher's scoring is very useful for monitoring of rickets. The formula derived through linear regression can predict the expected time for radiological resolution.

  6. Impact of divorce on the quality of life in school-age children.

    PubMed

    Eymann, Alfredo; Busaniche, Julio; Llera, Julián; De Cunto, Carmen; Wahren, Carlos

    2009-01-01

    To assess psychosocial quality of life in school-age children of divorced parents. A cross-sectional survey was conducted at the pediatric outpatient clinic of a community hospital. Children 5 to 12 years old from married families and divorced families were included. Child quality of life was assessed through maternal reports using a Child Health Questionnaire-Parent Form 50. A multiple linear regression model was constructed including clinically relevant variables significant on univariate analysis (beta coefficient and 95%CI). Three hundred and thirty families were invited to participate and 313 completed the questionnaire. Univariate analysis showed that quality of life was significantly associated with parental separation, child sex, time spent with the father, standard of living, and maternal education. In a multiple linear regression model, quality of life scores decreased in boys -4.5 (-6.8 to -2.3) and increased for time spent with the father 0.09 (0.01 to 0.2). In divorced families, multiple linear regression showed that quality of life scores increased when parents had separated by mutual agreement 6.1 (2.7 to 9.4), when the mother had university level education 5.9 (1.7 to 10.1) and for each year elapsed since separation 0.6 (0.2 to 1.1), whereas scores decreased in boys -5.4 (-9.5 to -1.3) and for each one-year increment of maternal age -0.4 (-0.7 to -0.05). Children's psychosocial quality of life was affected by divorce. The Child Health Questionnaire can be useful to detect a decline in the psychosocial quality of life.

  7. A primer for biomedical scientists on how to execute model II linear regression analysis.

    PubMed

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.

  8. Analyzing Multilevel Data: Comparing Findings from Hierarchical Linear Modeling and Ordinary Least Squares Regression

    ERIC Educational Resources Information Center

    Rocconi, Louis M.

    2013-01-01

    This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…

  9. A Linear Dynamical Systems Approach to Streamflow Reconstruction Reveals History of Regime Shifts in Northern Thailand

    NASA Astrophysics Data System (ADS)

    Nguyen, Hung T. T.; Galelli, Stefano

    2018-03-01

    Catchment dynamics is not often modeled in streamflow reconstruction studies; yet, the streamflow generation process depends on both catchment state and climatic inputs. To explicitly account for this interaction, we contribute a linear dynamic model, in which streamflow is a function of both catchment state (i.e., wet/dry) and paleoclimatic proxies. The model is learned using a novel variant of the Expectation-Maximization algorithm, and it is used with a paleo drought record—the Monsoon Asia Drought Atlas—to reconstruct 406 years of streamflow for the Ping River (northern Thailand). Results for the instrumental period show that the dynamic model has higher accuracy than conventional linear regression; all performance scores improve by 45-497%. Furthermore, the reconstructed trajectory of the state variable provides valuable insights about the catchment history—e.g., regime-like behavior—thereby complementing the information contained in the reconstructed streamflow time series. The proposed technique can replace linear regression, since it only requires information on streamflow and climatic proxies (e.g., tree-rings, drought indices); furthermore, it is capable of readily generating stochastic streamflow replicates. With a marginal increase in computational requirements, the dynamic model brings more desirable features and value to streamflow reconstructions.

  10. Changes in aerobic power of women, ages 20-64 yr

    NASA Technical Reports Server (NTRS)

    Jackson, A. S.; Wier, L. T.; Ayers, G. W.; Beard, E. F.; Stuteville, J. E.; Blair, S. N.

    1996-01-01

    This study quantified and compared the cross-sectional and longitudinal influence of age, self-report physical activity (SR-PA), and body composition (%fat) on the decline of maximal aerobic power (VO2peak) of women. The cross-sectional sample consisted of 409 healthy women, ages 20-64 yr. The 43 women of the longitudinal sample were from the same population and examined twice, the mean time between tests was 3.7 (+/-2.2) yr. Peak oxygen uptake was determined by indirect calorimetry during a maximal treadmill test. The zero-order correlation of -0.742 between VO2peak and %fat was significantly (P < 0.05) higher then the SR-PA (r = 0.626) and age correlations (r = -0.633). Linear regression defined the cross-sectional age-related decline in VO2peak at 0.537 ml.kg-1.min-1.yr-1. Multiple regression analysis (R = 0.851) showed that adding %fat and SR-PA and their interaction to the regression model reduced the age regression weight of -0.537, to -0.265 ml.kg-1.min-1.yr-1. Statistically controlling for time differences between tests, general linear models analysis showed that longitudinal changes in aerobic power were due to independent changes in %fat and SR-PA, confirming the cross-sectional results. These findings are consistent with men's data from the same lab showing that about 50% of the cross-sectional age-related decline in VO2peak was due to %fat and SR-PA.

  11. Regression analysis of sparse asynchronous longitudinal data.

    PubMed

    Cao, Hongyuan; Zeng, Donglin; Fine, Jason P

    2015-09-01

    We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent responses and covariates are observed intermittently within subjects. Unlike with synchronous data, where the response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel-weighted estimating equations are proposed for generalized linear models with either time invariant or time-dependent coefficients under smoothness assumptions for the covariate processes which are similar to those for synchronous data. For models with either time invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal but converge at slower rates than those achieved with synchronous data. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to a naive application of methods for synchronous data based on an ad hoc last value carried forward approach. The practical utility of the methods is illustrated on data from a study on human immunodeficiency virus.

  12. Validity of Treadmill-Derived Critical Speed on Predicting 5000-Meter Track-Running Performance.

    PubMed

    Nimmerichter, Alfred; Novak, Nina; Triska, Christoph; Prinz, Bernhard; Breese, Brynmor C

    2017-03-01

    Nimmerichter, A, Novak, N, Triska, C, Prinz, B, and Breese, BC. Validity of treadmill-derived critical speed on predicting 5,000-meter track-running performance. J Strength Cond Res 31(3): 706-714, 2017-To evaluate 3 models of critical speed (CS) for the prediction of 5,000-m running performance, 16 trained athletes completed an incremental test on a treadmill to determine maximal aerobic speed (MAS) and 3 randomly ordered runs to exhaustion at the [INCREMENT]70% intensity, at 110% and 98% of MAS. Critical speed and the distance covered above CS (D') were calculated using the hyperbolic speed-time (HYP), the linear distance-time (LIN), and the linear speed inverse-time model (INV). Five thousand meter performance was determined on a 400-m running track. Individual predictions of 5,000-m running time (t = [5,000-D']/CS) and speed (s = D'/t + CS) were calculated across the 3 models in addition to multiple regression analyses. Prediction accuracy was assessed with the standard error of estimate (SEE) from linear regression analysis and the mean difference expressed in units of measurement and coefficient of variation (%). Five thousand meter running performance (speed: 4.29 ± 0.39 m·s; time: 1,176 ± 117 seconds) was significantly better than the predictions from all 3 models (p < 0.0001). The mean difference was 65-105 seconds (5.7-9.4%) for time and -0.22 to -0.34 m·s (-5.0 to -7.5%) for speed. Predictions from multiple regression analyses with CS and D' as predictor variables were not significantly different from actual running performance (-1.0 to 1.1%). The SEE across all models and predictions was approximately 65 seconds or 0.20 m·s and is therefore considered as moderate. The results of this study have shown the importance of aerobic and anaerobic energy system contribution to predict 5,000-m running performance. Using estimates of CS and D' is valuable for predicting performance over race distances of 5,000 m.

  13. Analyzing Multilevel Data: An Empirical Comparison of Parameter Estimates of Hierarchical Linear Modeling and Ordinary Least Squares Regression

    ERIC Educational Resources Information Center

    Rocconi, Louis M.

    2011-01-01

    Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…

  14. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    ERIC Educational Resources Information Center

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  15. Classical Testing in Functional Linear Models.

    PubMed

    Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab

    2016-01-01

    We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.

  16. Classical Testing in Functional Linear Models

    PubMed Central

    Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab

    2016-01-01

    We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155

  17. PREDICTING CHRONIC TOXICITY OF CHEMICALS TO FISHES FROM ACUTE TOXICITY TEST DATA: CONCEPT AND LINEAR REGRESSION

    EPA Science Inventory

    A comprehensive approach to predicting chronic toxicity from acute.toxicity data was developed in which simultaneous consideration is given to concentration, degree of response, and time course of effect. onsistent endpoint (lethality) and degree of response (O%) were used to com...

  18. Comparative Research of Navy Voluntary Education at Operational Commands

    DTIC Science & Technology

    2017-03-01

    return on investment, ROI, logistic regression, multivariate analysis, descriptive statistics, Markov, time-series, linear programming 15. NUMBER...21  B.  DESCRIPTIVE STATISTICS TABLES ...............................................25  C.  PRIVACY CONSIDERATIONS...THIS PAGE INTENTIONALLY LEFT BLANK xi LIST OF TABLES Table 1.  Variables and Descriptions . Adapted from NETC (2016). .......................21

  19. A Quantitative Assessment of Student Performance and Examination Format

    ERIC Educational Resources Information Center

    Davison, Christopher B.; Dustova, Gandzhina

    2017-01-01

    This research study describes the correlations between student performance and examination format in a higher education teaching and research institution. The researchers employed a quantitative, correlational methodology utilizing linear regression analysis. The data was obtained from undergraduate student test scores over a three-year time span.…

  20. 40 CFR 86.1341-90 - Test cycle validation criteria.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...

  1. 40 CFR 86.1341-90 - Test cycle validation criteria.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 20 2013-07-01 2013-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...

  2. 40 CFR 86.1341-90 - Test cycle validation criteria.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 20 2012-07-01 2012-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...

  3. A FORTRAN program for multivariate survival analysis on the personal computer.

    PubMed

    Mulder, P G

    1988-01-01

    In this paper a FORTRAN program is presented for multivariate survival or life table regression analysis in a competing risks' situation. The relevant failure rate (for example, a particular disease or mortality rate) is modelled as a log-linear function of a vector of (possibly time-dependent) explanatory variables. The explanatory variables may also include the variable time itself, which is useful for parameterizing piecewise exponential time-to-failure distributions in a Gompertz-like or Weibull-like way as a more efficient alternative to Cox's proportional hazards model. Maximum likelihood estimates of the coefficients of the log-linear relationship are obtained from the iterative Newton-Raphson method. The program runs on a personal computer under DOS; running time is quite acceptable, even for large samples.

  4. Nonlinear multivariate and time series analysis by neural network methods

    NASA Astrophysics Data System (ADS)

    Hsieh, William W.

    2004-03-01

    Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.

  5. A Linear Regression and Markov Chain Model for the Arabian Horse Registry

    DTIC Science & Technology

    1993-04-01

    as a tax deduction? Yes No T-4367 68 26. Regardless of previous equine tax deductions, do you consider your current horse activities to be... (Mark one...E L T-4367 A Linear Regression and Markov Chain Model For the Arabian Horse Registry Accesion For NTIS CRA&I UT 7 4:iC=D 5 D-IC JA" LI J:13tjlC,3 lO...the Arabian Horse Registry, which needed to forecast its future registration of purebred Arabian horses . A linear regression model was utilized to

  6. An improved multiple linear regression and data analysis computer program package

    NASA Technical Reports Server (NTRS)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  7. Optimizing the time-frame for the definition of bleeding-related death after acute variceal bleeding in cirrhosis.

    PubMed

    Merkel, C; Gatta, A; Bellumat, A; Bolognesi, M; Borsato, L; Caregaro, L; Cavallarin, G; Cielo, R; Cristina, P; Cucci, E; Donada, C; Donadon, V; Enzo, E; Martin, R; Mazzaro, C; Sacerdoti, D; Torboli, P

    1996-01-01

    To identify the best time-frame for defining bleeding-related death after variceal bleeding in patients with cirrhosis. Prospective long-term evaluation of a cohort of 155 patients admitted with variceal bleeding. Eight medical departments in seven hospitals in north-eastern Italy. Non-linear regression analysis of a hazard curve for death, and Cox's multiple regression analyses using different zero-time points. Cumulative hazard plots gave two slopes, the first corresponding to the risk of death from acute bleeding, the second a baseline risk of death. The first 30 days were outside the confidence limits of the regression curve for the baseline risk of death. Using Cox's regression analysis, the significant predictors of overall mortality risk were balanced between factors related to severity of bleeding and those related to severity of liver disease. If only deaths occurring after 30 days were considered, only predictors related to the severity of liver disease were found to be of importance. Thirty days after bleeding is considered to be a reasonable time-frame for the definition of bleeding-related death in patients with cirrhosis and variceal bleeding.

  8. Wavelet-based functional linear mixed models: an application to measurement error-corrected distributed lag models.

    PubMed

    Malloy, Elizabeth J; Morris, Jeffrey S; Adar, Sara D; Suh, Helen; Gold, Diane R; Coull, Brent A

    2010-07-01

    Frequently, exposure data are measured over time on a grid of discrete values that collectively define a functional observation. In many applications, researchers are interested in using these measurements as covariates to predict a scalar response in a regression setting, with interest focusing on the most biologically relevant time window of exposure. One example is in panel studies of the health effects of particulate matter (PM), where particle levels are measured over time. In such studies, there are many more values of the functional data than observations in the data set so that regularization of the corresponding functional regression coefficient is necessary for estimation. Additional issues in this setting are the possibility of exposure measurement error and the need to incorporate additional potential confounders, such as meteorological or co-pollutant measures, that themselves may have effects that vary over time. To accommodate all these features, we develop wavelet-based linear mixed distributed lag models that incorporate repeated measures of functional data as covariates into a linear mixed model. A Bayesian approach to model fitting uses wavelet shrinkage to regularize functional coefficients. We show that, as long as the exposure error induces fine-scale variability in the functional exposure profile and the distributed lag function representing the exposure effect varies smoothly in time, the model corrects for the exposure measurement error without further adjustment. Both these conditions are likely to hold in the environmental applications we consider. We examine properties of the method using simulations and apply the method to data from a study examining the association between PM, measured as hourly averages for 1-7 days, and markers of acute systemic inflammation. We use the method to fully control for the effects of confounding by other time-varying predictors, such as temperature and co-pollutants.

  9. Biostatistics Series Module 6: Correlation and Linear Regression.

    PubMed

    Hazra, Avijit; Gogtay, Nithya

    2016-01-01

    Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.

  10. Biostatistics Series Module 6: Correlation and Linear Regression

    PubMed Central

    Hazra, Avijit; Gogtay, Nithya

    2016-01-01

    Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous. PMID:27904175

  11. Linear Regression Quantile Mapping (RQM) - A new approach to bias correction with consistent quantile trends

    NASA Astrophysics Data System (ADS)

    Passow, Christian; Donner, Reik

    2017-04-01

    Quantile mapping (QM) is an established concept that allows to correct systematic biases in multiple quantiles of the distribution of a climatic observable. It shows remarkable results in correcting biases in historical simulations through observational data and outperforms simpler correction methods which relate only to the mean or variance. Since it has been shown that bias correction of future predictions or scenario runs with basic QM can result in misleading trends in the projection, adjusted, trend preserving, versions of QM were introduced in the form of detrended quantile mapping (DQM) and quantile delta mapping (QDM) (Cannon, 2015, 2016). Still, all previous versions and applications of QM based bias correction rely on the assumption of time-independent quantiles over the investigated period, which can be misleading in the context of a changing climate. Here, we propose a novel combination of linear quantile regression (QR) with the classical QM method to introduce a consistent, time-dependent and trend preserving approach of bias correction for historical and future projections. Since QR is a regression method, it is possible to estimate quantiles in the same resolution as the given data and include trends or other dependencies. We demonstrate the performance of the new method of linear regression quantile mapping (RQM) in correcting biases of temperature and precipitation products from historical runs (1959 - 2005) of the COSMO model in climate mode (CCLM) from the Euro-CORDEX ensemble relative to gridded E-OBS data of the same spatial and temporal resolution. A thorough comparison with established bias correction methods highlights the strengths and potential weaknesses of the new RQM approach. References: A.J. Cannon, S.R. Sorbie, T.Q. Murdock: Bias Correction of GCM Precipitation by Quantile Mapping - How Well Do Methods Preserve Changes in Quantiles and Extremes? Journal of Climate, 28, 6038, 2015 A.J. Cannon: Multivariate Bias Correction of Climate Model Outputs - Matching Marginal Distributions and Inter-variable Dependence Structure. Journal of Climate, 29, 7045, 2016

  12. Quantitative structure-retention relationship models for the prediction of the reversed-phase HPLC gradient retention based on the heuristic method and support vector machine.

    PubMed

    Du, Hongying; Wang, Jie; Yao, Xiaojun; Hu, Zhide

    2009-01-01

    The heuristic method (HM) and support vector machine (SVM) were used to construct quantitative structure-retention relationship models by a series of compounds to predict the gradient retention times of reversed-phase high-performance liquid chromatography (HPLC) in three different columns. The aims of this investigation were to predict the retention times of multifarious compounds, to find the main properties of the three columns, and to indicate the theory of separation procedures. In our method, we correlated the retention times of many diverse structural analytes in three columns (Symmetry C18, Chromolith, and SG-MIX) with their representative molecular descriptors, calculated from the molecular structures alone. HM was used to select the most important molecular descriptors and build linear regression models. Furthermore, non-linear regression models were built using the SVM method; the performance of the SVM models were better than that of the HM models, and the prediction results were in good agreement with the experimental values. This paper could give some insights into the factors that were likely to govern the gradient retention process of the three investigated HPLC columns, which could theoretically supervise the practical experiment.

  13. Potential pitfalls when denoising resting state fMRI data using nuisance regression.

    PubMed

    Bright, Molly G; Tench, Christopher R; Murphy, Kevin

    2017-07-01

    In resting state fMRI, it is necessary to remove signal variance associated with noise sources, leaving cleaned fMRI time-series that more accurately reflect the underlying intrinsic brain fluctuations of interest. This is commonly achieved through nuisance regression, in which the fit is calculated of a noise model of head motion and physiological processes to the fMRI data in a General Linear Model, and the "cleaned" residuals of this fit are used in further analysis. We examine the statistical assumptions and requirements of the General Linear Model, and whether these are met during nuisance regression of resting state fMRI data. Using toy examples and real data we show how pre-whitening, temporal filtering and temporal shifting of regressors impact model fit. Based on our own observations, existing literature, and statistical theory, we make the following recommendations when employing nuisance regression: pre-whitening should be applied to achieve valid statistical inference of the noise model fit parameters; temporal filtering should be incorporated into the noise model to best account for changes in degrees of freedom; temporal shifting of regressors, although merited, should be achieved via optimisation and validation of a single temporal shift. We encourage all readers to make simple, practical changes to their fMRI denoising pipeline, and to regularly assess the appropriateness of the noise model used. By negotiating the potential pitfalls described in this paper, and by clearly reporting the details of nuisance regression in future manuscripts, we hope that the field will achieve more accurate and precise noise models for cleaning the resting state fMRI time-series. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  14. Bone mineral density across a range of physical activity volumes: NHANES 2007-2010.

    PubMed

    Whitfield, Geoffrey P; Kohrt, Wendy M; Pettee Gabriel, Kelley K; Rahbar, Mohammad H; Kohl, Harold W

    2015-02-01

    The association between aerobic physical activity volume and bone mineral density (BMD) is not completely understood. The purpose of this study was to clarify the association between BMD and aerobic activity across a broad range of activity volumes, particularly volumes between those recommended in the 2008 Physical Activity Guidelines for Americans and those of trained endurance athletes. Data from the 2007-2010 National Health and Nutrition Examination Survey were used to quantify the association between reported physical activity and BMD at the lumbar spine and proximal femur across the entire range of activity volumes reported by US adults. Participants were categorized into multiples of the minimum guideline-recommended volume based on reported moderate- and vigorous-intensity leisure activity. Lumbar and proximal femur BMD were assessed with dual-energy x-ray absorptiometry. Among women, multivariable-adjusted linear regression analyses revealed no significant differences in lumbar BMD across activity categories, whereas proximal femur BMD was significantly higher among those who exceeded the guidelines by 2-4 times than those who reported no activity. Among men, multivariable-adjusted BMD at both sites neared its highest values among those who exceeded the guidelines by at least 4 times and was not progressively higher with additional activity. Logistic regression estimating the odds of low BMD generally echoed the linear regression results. The association between physical activity volume and BMD is complex. Among women, exceeding guidelines by 2-4 times may be important for maximizing BMD at the proximal femur, whereas among men, exceeding guidelines by ≥4 times may be beneficial for lumbar and proximal femur BMD.

  15. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    ERIC Educational Resources Information Center

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  16. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

    PubMed

    Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

    2009-11-01

    G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.

  17. Have the temperature time series a structural change after 1998?

    NASA Astrophysics Data System (ADS)

    Werner, Rolf; Valev, Dimitare; Danov, Dimitar

    2012-07-01

    The global and hemisphere temperature GISS and Hadcrut3 time series were analysed for structural changes. We postulate the continuity of the preceding temperature function depending from the time. The slopes are calculated for a sequence of segments limited by time thresholds. We used a standard method, the restricted linear regression with dummy variables. We performed the calculations and tests for different number of thresholds. The thresholds are searched continuously in determined time intervals. The F-statistic is used to obtain the time points of the structural changes.

  18. Geodesic regression for image time-series.

    PubMed

    Niethammer, Marc; Huang, Yang; Vialard, François-Xavier

    2011-01-01

    Registration of image-time series has so far been accomplished (i) by concatenating registrations between image pairs, (ii) by solving a joint estimation problem resulting in piecewise geodesic paths between image pairs, (iii) by kernel based local averaging or (iv) by augmenting the joint estimation with additional temporal irregularity penalties. Here, we propose a generative model extending least squares linear regression to the space of images by using a second-order dynamic formulation for image registration. Unlike previous approaches, the formulation allows for a compact representation of an approximation to the full spatio-temporal trajectory through its initial values. The method also opens up possibilities to design image-based approximation algorithms. The resulting optimization problem is solved using an adjoint method.

  19. Time-resolved perfusion imaging at the angiography suite: preclinical comparison of a new flat-detector application to computed tomography perfusion.

    PubMed

    Jürgens, Julian H W; Schulz, Nadine; Wybranski, Christian; Seidensticker, Max; Streit, Sebastian; Brauner, Jan; Wohlgemuth, Walter A; Deuerling-Zheng, Yu; Ricke, Jens; Dudeck, Oliver

    2015-02-01

    The objective of this study was to compare the parameter maps of a new flat-panel detector application for time-resolved perfusion imaging in the angiography room (FD-CTP) with computed tomography perfusion (CTP) in an experimental tumor model. Twenty-four VX2 tumors were implanted into the hind legs of 12 rabbits. Three weeks later, FD-CTP (Artis zeego; Siemens) and CTP (SOMATOM Definition AS +; Siemens) were performed. The parameter maps for the FD-CTP were calculated using a prototype software, and those for the CTP were calculated with VPCT-body software on a dedicated syngo MultiModality Workplace. The parameters were compared using Pearson product-moment correlation coefficient and linear regression analysis. The Pearson product-moment correlation coefficient showed good correlation values for both the intratumoral blood volume of 0.848 (P < 0.01) and the blood flow of 0.698 (P < 0.01). The linear regression analysis of the perfusion between FD-CTP and CTP showed for the blood volume a regression equation y = 4.44x + 36.72 (P < 0.01) and for the blood flow y = 0.75x + 14.61 (P < 0.01). This preclinical study provides evidence that FD-CTP allows a time-resolved (dynamic) perfusion imaging of tumors similar to CTP, which provides the basis for clinical applications such as the assessment of tumor response to locoregional therapies directly in the angiography suite.

  20. Evaluating abundance and trends in a Hawaiian avian community using state-space analysis

    USGS Publications Warehouse

    Camp, Richard J.; Brinck, Kevin W.; Gorresen, P.M.; Paxton, Eben H.

    2016-01-01

    Estimating population abundances and patterns of change over time are important in both ecology and conservation. Trend assessment typically entails fitting a regression to a time series of abundances to estimate population trajectory. However, changes in abundance estimates from year-to-year across time are due to both true variation in population size (process variation) and variation due to imperfect sampling and model fit. State-space models are a relatively new method that can be used to partition the error components and quantify trends based only on process variation. We compare a state-space modelling approach with a more traditional linear regression approach to assess trends in uncorrected raw counts and detection-corrected abundance estimates of forest birds at Hakalau Forest National Wildlife Refuge, Hawai‘i. Most species demonstrated similar trends using either method. In general, evidence for trends using state-space models was less strong than for linear regression, as measured by estimates of precision. However, while the state-space models may sacrifice precision, the expectation is that these estimates provide a better representation of the real world biological processes of interest because they are partitioning process variation (environmental and demographic variation) and observation variation (sampling and model variation). The state-space approach also provides annual estimates of abundance which can be used by managers to set conservation strategies, and can be linked to factors that vary by year, such as climate, to better understand processes that drive population trends.

  1. A statistical model for Windstorm Variability over the British Isles based on Large-scale Atmospheric and Oceanic Mechanisms

    NASA Astrophysics Data System (ADS)

    Kirchner-Bossi, Nicolas; Befort, Daniel J.; Wild, Simon B.; Ulbrich, Uwe; Leckebusch, Gregor C.

    2016-04-01

    Time-clustered winter storms are responsible for a majority of the wind-induced losses in Europe. Over last years, different atmospheric and oceanic large-scale mechanisms as the North Atlantic Oscillation (NAO) or the Meridional Overturning Circulation (MOC) have been proven to drive some significant portion of the windstorm variability over Europe. In this work we systematically investigate the influence of different large-scale natural variability modes: more than 20 indices related to those mechanisms with proven or potential influence on the windstorm frequency variability over Europe - mostly SST- or pressure-based - are derived by means of ECMWF ERA-20C reanalysis during the last century (1902-2009), and compared to the windstorm variability for the European winter (DJF). Windstorms are defined and tracked as in Leckebusch et al. (2008). The derived indices are then employed to develop a statistical procedure including a stepwise Multiple Linear Regression (MLR) and an Artificial Neural Network (ANN), aiming to hindcast the inter-annual (DJF) regional windstorm frequency variability in a case study for the British Isles. This case study reveals 13 indices with a statistically significant coupling with seasonal windstorm counts. The Scandinavian Pattern (SCA) showed the strongest correlation (0.61), followed by the NAO (0.48) and the Polar/Eurasia Pattern (0.46). The obtained indices (standard-normalised) are selected as predictors for a windstorm variability hindcast model applied for the British Isles. First, a stepwise linear regression is performed, to identify which mechanisms can explain windstorm variability best. Finally, the indices retained by the stepwise regression are used to develop a multlayer perceptron-based ANN that hindcasted seasonal windstorm frequency and clustering. Eight indices (SCA, NAO, EA, PDO, W.NAtl.SST, AMO (unsmoothed), EA/WR and Trop.N.Atl SST) are retained by the stepwise regression. Among them, SCA showed the highest linear coefficient, followed by SST in western Atlantic, AMO and NAO. The explanatory regression model (considering all time steps) provided a Coefficient of Determination (R^2) of 0.75. A predictive version of the linear model applying a leave-one-out cross-validation (LOOCV) shows an R2 of 0.56 and a relative RMSE of 4.67 counts/season. An ANN-based nonlinear hindcast model for the seasonal windstorm frequency is developed with the aim to improve the stepwise hindcast ability and thus better predict a time-clustered season over the case study. A 7 node-hidden layer perceptron is set, and the LOOCV procedure reveals a R2 of 0.71. In comparison to the stepwise MLR the RMSE is reduced a 20%. This work shows that for the British Isles case study, most of the interannual variability can be explained by certain large-scale mechanisms, considering also nonlinear effects (ANN). This allows to discern a time-clustered season from a non-clustered one - a key issue for applications e.g., in the (re)insurance industry.

  2. Simulation of multi-stage nonlinear bone remodeling induced by fixed partial dentures of different configurations: a comparative clinical and numerical study.

    PubMed

    Liao, Zhipeng; Yoda, Nobuhiro; Chen, Junning; Zheng, Keke; Sasaki, Keiichi; Swain, Michael V; Li, Qing

    2017-04-01

    This paper aimed to develop a clinically validated bone remodeling algorithm by integrating bone's dynamic properties in a multi-stage fashion based on a four-year clinical follow-up of implant treatment. The configurational effects of fixed partial dentures (FPDs) were explored using a multi-stage remodeling rule. Three-dimensional real-time occlusal loads during maximum voluntary clenching were measured with a piezoelectric force transducer and were incorporated into a computerized tomography-based finite element mandibular model. Virtual X-ray images were generated based on simulation and statistically correlated with clinical data using linear regressions. The strain energy density-driven remodeling parameters were regulated over the time frame considered. A linear single-stage bone remodeling algorithm, with a single set of constant remodeling parameters, was found to poorly fit with clinical data through linear regression (low [Formula: see text] and R), whereas a time-dependent multi-stage algorithm better simulated the remodeling process (high [Formula: see text] and R) against the clinical results. The three-implant-supported and distally cantilevered FPDs presented noticeable and continuous bone apposition, mainly adjacent to the cervical and apical regions. The bridged and mesially cantilevered FPDs showed bone resorption or no visible bone formation in some areas. Time-dependent variation of bone remodeling parameters is recommended to better correlate remodeling simulation with clinical follow-up. The position of FPD pontics plays a critical role in mechanobiological functionality and bone remodeling. Caution should be exercised when selecting the cantilever FPD due to the risk of overloading bone resorption.

  3. Computation of nonlinear least squares estimator and maximum likelihood using principles in matrix calculus

    NASA Astrophysics Data System (ADS)

    Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.

    2017-11-01

    This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation

  4. A method for fitting regression splines with varying polynomial order in the linear mixed model.

    PubMed

    Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W

    2006-02-15

    The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.

  5. Net analyte signal-based simultaneous determination of ethanol and water by quartz crystal nanobalance sensor.

    PubMed

    Mirmohseni, A; Abdollahi, H; Rostamizadeh, K

    2007-02-28

    Net analyte signal (NAS)-based method called HLA/GO was applied for the selectively determination of binary mixture of ethanol and water by quartz crystal nanobalance (QCN) sensor. A full factorial design was applied for the formation of calibration and prediction sets in the concentration ranges 5.5-22.2 microg mL(-1) for ethanol and 7.01-28.07 microg mL(-1) for water. An optimal time range was selected by procedure which was based on the calculation of the net analyte signal regression plot in any considered time window for each test sample. A moving window strategy was used for searching the region with maximum linearity of NAS regression plot (minimum error indicator) and minimum of PRESS value. On the base of obtained results, the differences on the adsorption profiles in the time range between 1 and 600 s were used to determine mixtures of both compounds by HLA/GO method. The calculation of the net analytical signal using HLA/GO method allows determination of several figures of merit like selectivity, sensitivity, analytical sensitivity and limit of detection, for each component. To check the ability of the proposed method in the selection of linear regions of adsorption profile, a test for detecting non-linear regions of adsorption profile data in the presence of methanol was also described. The results showed that the method was successfully applied for the determination of ethanol and water.

  6. Can cover data be used as a surrogate for seedling counts in regeneration stocking evaluations in northern hardwood forests?

    Treesearch

    Todd E. Ristau; Susan L. Stout

    2014-01-01

    Assessment of regeneration can be time-consuming and costly. Often, foresters look for ways to minimize the cost of doing inventories. One potential method to reduce time required on a plot is use of percent cover data rather than seedling count data to determine stocking. Robust linear regression analysis was used in this report to predict seedling count data from...

  7. Assessing the performance of eight real-time updating models and procedures for the Brosna River

    NASA Astrophysics Data System (ADS)

    Goswami, M.; O'Connor, K. M.; Bhattarai, K. P.; Shamseldin, A. Y.

    2005-10-01

    The flow forecasting performance of eight updating models, incorporated in the Galway River Flow Modelling and Forecasting System (GFMFS), was assessed using daily data (rainfall, evaporation and discharge) of the Irish Brosna catchment (1207 km2), considering their one to six days lead-time discharge forecasts. The Perfect Forecast of Input over the Forecast Lead-time scenario was adopted, where required, in place of actual rainfall forecasts. The eight updating models were: (i) the standard linear Auto-Regressive (AR) model, applied to the forecast errors (residuals) of a simulation (non-updating) rainfall-runoff model; (ii) the Neural Network Updating (NNU) model, also using such residuals as input; (iii) the Linear Transfer Function (LTF) model, applied to the simulated and the recently observed discharges; (iv) the Non-linear Auto-Regressive eXogenous-Input Model (NARXM), also a neural network-type structure, but having wide options of using recently observed values of one or more of the three data series, together with non-updated simulated outflows, as inputs; (v) the Parametric Simple Linear Model (PSLM), of LTF-type, using recent rainfall and observed discharge data; (vi) the Parametric Linear perturbation Model (PLPM), also of LTF-type, using recent rainfall and observed discharge data, (vii) n-AR, an AR model applied to the observed discharge series only, as a naïve updating model; and (viii) n-NARXM, a naive form of the NARXM, using only the observed discharge data, excluding exogenous inputs. The five GFMFS simulation (non-updating) models used were the non-parametric and parametric forms of the Simple Linear Model and of the Linear Perturbation Model, the Linearly-Varying Gain Factor Model, the Artificial Neural Network Model, and the conceptual Soil Moisture Accounting and Routing (SMAR) model. As the SMAR model performance was found to be the best among these models, in terms of the Nash-Sutcliffe R2 value, both in calibration and in verification, the simulated outflows of this model only were selected for the subsequent exercise of producing updated discharge forecasts. All the eight forms of updating models for producing lead-time discharge forecasts were found to be capable of producing relatively good lead-1 (1-day ahead) forecasts, with R2 values almost 90% or above. However, for higher lead time forecasts, only three updating models, viz., NARXM, LTF, and NNU, were found to be suitable, with lead-6 values of R2 about 90% or higher. Graphical comparisons were made of the lead-time forecasts for the two largest floods, one in the calibration period and the other in the verification period.

  8. Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression

    NASA Astrophysics Data System (ADS)

    Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.

    2013-02-01

    Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.

  9. Effect of mobile phone use on metal ion release from fixed orthodontic appliances.

    PubMed

    Saghiri, Mohammad Ali; Orangi, Jafar; Asatourian, Armen; Mehriar, Peiman; Sheibani, Nader

    2015-06-01

    The aim of this study was to evaluate the effect of exposure to radiofrequency electromagnetic fields emitted by mobile phones on the level of nickel in saliva. Fifty healthy patients with fixed orthodontic appliances were asked not to use their cell phones for a week, and their saliva samples were taken at the end of the week (control group). The patients recorded their time of mobile phone usage during the next week and returned for a second saliva collection (experimental group). Samples at both times were taken between 8:00 and 10:00 pm, and the nickel levels were measured. Two-tailed paired-samples t test, linear regression, independent t test, and 1-way analysis of variance were used for data analysis. The 2-tailed paired-samples t test showed significant differences between the levels of nickel in the control and experimental groups (t [49] = 9.967; P <0.001). The linear regression test showed a significant relationship between mobile phone usage time and the nickel release (F [1, 48] = 60.263; P <0.001; R(2) = 0.577). Mobile phone usage has a time-dependent influence on the concentration of nickel in the saliva of patients with orthodontic appliances. Copyright © 2015 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.

  10. GIS Tools to Estimate Average Annual Daily Traffic

    DOT National Transportation Integrated Search

    2012-06-01

    This project presents five tools that were created for a geographical information system to estimate Annual Average Daily : Traffic using linear regression. Three of the tools can be used to prepare spatial data for linear regression. One tool can be...

  11. Estimating extent of mortality associated with the Douglas-fir beetle in the Central and Northern Rockies

    Treesearch

    Jose F. Negron; Willis C. Schaupp; Kenneth E. Gibson; John Anhold; Dawn Hansen; Ralph Thier; Phil Mocettini

    1999-01-01

    Data collected from Douglas-fir stands infected by the Douglas-fir beetle in Wyoming, Montana, Idaho, and Utah, were used to develop models to estimate amount of mortality in terms of basal area killed. Models were built using stepwise linear regression and regression tree approaches. Linear regression models using initial Douglas-fir basal area were built for all...

  12. Development and evaluation of a reservoir model for the Chain of Lakes in Illinois

    USGS Publications Warehouse

    Domanski, Marian M.

    2017-01-27

    Forecasts of flows entering and leaving the Chain of Lakes reservoir on the Fox River in northeastern Illinois are critical information to water-resource managers who determine the optimal operation of the dam at McHenry, Illinois, to help minimize damages to property and loss of life because of flooding on the Fox River. In 2014, the U.S. Geological Survey; the Illinois Department of Natural Resources, Office of Water Resources; and National Weather Service, North Central River Forecast Center began a cooperative study to develop a system to enable engineers and planners to simulate and communicate flows and to prepare proactively for precipitation events in near real time in the upper Fox River watershed. The purpose of this report is to document the development and evaluation of the Chain of Lakes reservoir model developed in this study.The reservoir model for the Chain of Lakes was developed using the Hydrologic Engineering Center–Reservoir System Simulation program. Because of the complex relation between the dam headwater and reservoir pool elevations, the reservoir model uses a linear regression model that relates dam headwater elevation to reservoir pool elevation. The linear regression model was developed using 17 U.S. Geological Survey streamflow measurements, along with the gage height in the reservoir pool and the gage height at the dam headwater. The Nash-Sutcliffe model efficiency coefficients for all three linear regression model variables ranged from 0.90 to 0.98.The reservoir model performance was evaluated by graphically comparing simulated and observed reservoir pool elevation time series during nine periods of high pool elevation. In addition, the peak elevations during these time periods were graphically compared to the closest-in-time observed pool elevation peak. The mean difference in the simulated and observed peak elevations was -0.03 feet, with a standard deviation of 0.19 feet. The Nash-Sutcliffe coefficient for peak prediction was calculated as 0.94. Evaluation of the model based on accuracy of peak prediction and the ability to simulate an elevation time series showed the performance of the model was satisfactory.

  13. [Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

    PubMed

    Ling, Ru; Liu, Jiawang

    2011-12-01

    To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.

  14. Uveal Melanoma Regression after Brachytherapy: Relationship with Chromosome 3 Monosomy Status.

    PubMed

    Salvi, Sachin M; Aziz, Hassan A; Dar, Suhail; Singh, Nakul; Hayden-Loreck, Brandy; Singh, Arun D

    2017-07-01

    The objective was to evaluate the relationship between the regression rate of ciliary body melanoma and choroidal melanoma after brachytherapy and chromosome 3 monosomy status. We conducted a prospective and consecutive case series of patients who underwent biopsy and brachytherapy for ciliary/choroidal melanoma. Tumor biopsy performed at the time of radiation plaque placement was analyzed with fluorescence in situ hybridization to determine the percentage of tumor cells with chromosome 3 monosomy. The regression rate was calculated as the percent change in tumor height at months 3, 6, and 12. The relationship between regression rate and tumor location, initial tumor height, and chromosome 3 monosomy (percentage) was assessed by univariate linear regression (R version 3.1.0). Of the 75 patients included in the study, 8 had ciliary body melanoma, and 67 were choroidal melanomas. The mean tumor height at the time of diagnosis was 5.2 mm (range: 1.90-13.00). The percentage composition of chromosome 3 monosomy ranged from 0-20% (n = 35) to 81-100% (n = 40). The regression of tumor height at months 3, 6, and 12 did not statistically correlate with tumor location (ciliary or choroidal), initial tumor height, or chromosome 3 monosomy (percentage). The regression rate of choroidal melanoma following brachytherapy did not correlate with chromosome 3 monosomy status.

  15. A new approach to correct the QT interval for changes in heart rate using a nonparametric regression model in beagle dogs.

    PubMed

    Watanabe, Hiroyuki; Miyazaki, Hiroyasu

    2006-01-01

    Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.

  16. Does Familism Lead to Increased Parental Monitoring?: Protective Factors for Coping with Risky Behaviors

    ERIC Educational Resources Information Center

    Romero, Andrea J.; Ruiz, Myrna

    2007-01-01

    We examined coping with risky behaviors (cigarettes, alcohol/drugs, yelling/ hitting, and anger), familism (family proximity and parental closeness) and parental monitoring (knowledge and discipline) in a sample of 56 adolescents (11-15 years old) predominantly of Mexican descent at two time points. Multiple linear regression analysis indicated…

  17. Bioassay of the Nucleopolyhedrosis Virus of Neodiprion sertifer (Hymenoptera: Diprionidae)

    Treesearch

    M.A. Mohamed; J.D. Podgwaite

    1982-01-01

    Linear regression analysis of probit mortality versus several concentrations of nucleopolyhedrosis virus of Neodiprion sertifer resulted in the equation Y = 2.170 + 0.872X. An LC50 was calculated at 1758 PIB/ml. Also, the incubation time of the virus was dependent on its concentration. Most insect viruses possess the potential...

  18. Linear regression analysis: part 14 of a series on evaluation of scientific publications.

    PubMed

    Schneider, Astrid; Hommel, Gerhard; Blettner, Maria

    2010-11-01

    Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.

  19. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

    PubMed

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  20. An Ionospheric Index Model based on Linear Regression and Neural Network Approaches

    NASA Astrophysics Data System (ADS)

    Tshisaphungo, Mpho; McKinnell, Lee-Anne; Bosco Habarulema, John

    2017-04-01

    The ionosphere is well known to reflect radio wave signals in the high frequency (HF) band due to the present of electron and ions within the region. To optimise the use of long distance HF communications, it is important to understand the drivers of ionospheric storms and accurately predict the propagation conditions especially during disturbed days. This paper presents the development of an ionospheric storm-time index over the South African region for the application of HF communication users. The model will result into a valuable tool to measure the complex ionospheric behaviour in an operational space weather monitoring and forecasting environment. The development of an ionospheric storm-time index is based on a single ionosonde station data over Grahamstown (33.3°S,26.5°E), South Africa. Critical frequency of the F2 layer (foF2) measurements for a period 1996-2014 were considered for this study. The model was developed based on linear regression and neural network approaches. In this talk validation results for low, medium and high solar activity periods will be discussed to demonstrate model's performance.

  1. Teaching High School Students Machine Learning Algorithms to Analyze Flood Risk Factors in River Deltas

    NASA Astrophysics Data System (ADS)

    Rose, R.; Aizenman, H.; Mei, E.; Choudhury, N.

    2013-12-01

    High School students interested in the STEM fields benefit most when actively participating, so I created a series of learning modules on how to analyze complex systems using machine-learning that give automated feedback to students. The automated feedbacks give timely responses that will encourage the students to continue testing and enhancing their programs. I have designed my modules to take the tactical learning approach in conveying the concepts behind correlation, linear regression, and vector distance based classification and clustering. On successful completion of these modules, students will learn how to calculate linear regression, Pearson's correlation, and apply classification and clustering techniques to a dataset. Working on these modules will allow the students to take back to the classroom what they've learned and then apply it to the Earth Science curriculum. During my research this summer, we applied these lessons to analyzing river deltas; we looked at trends in the different variables over time, looked for similarities in NDVI, precipitation, inundation, runoff and discharge, and attempted to predict floods based on the precipitation, waves mean, area of discharge, NDVI, and inundation.

  2. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    PubMed Central

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666

  3. Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment

    NASA Astrophysics Data System (ADS)

    Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty

    2017-12-01

    Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.

  4. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    PubMed

    Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

    2016-01-01

    Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.

  5. Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach

    NASA Astrophysics Data System (ADS)

    Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew

    2017-05-01

    This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.

  6. Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred; Volden, Thomas R.

    2010-01-01

    The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.

  7. Time series regression model for infectious disease and weather.

    PubMed

    Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro

    2015-10-01

    Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Predicting adult pulmonary ventilation volume and wearing complianceby on-board accelerometry during personal level exposure assessments

    NASA Astrophysics Data System (ADS)

    Rodes, C. E.; Chillrud, S. N.; Haskell, W. L.; Intille, S. S.; Albinali, F.; Rosenberger, M. E.

    2012-09-01

    BackgroundMetabolic functions typically increase with human activity, but optimal methods to characterize activity levels for real-time predictions of ventilation volume (l min-1) during exposure assessments have not been available. Could tiny, triaxial accelerometers be incorporated into personal level monitors to define periods of acceptable wearing compliance, and allow the exposures (μg m-3) to be extended to potential doses in μg min-1 kg-1 of body weight? ObjectivesIn a pilot effort, we tested: 1) whether appropriately-processed accelerometer data could be utilized to predict compliance and in linear regressions to predict ventilation volumes in real-time as an on-board component of personal level exposure sensor systems, and 2) whether locating the exposure monitors on the chest in the breathing zone, provided comparable accelerometric data to other locations more typically utilized (waist, thigh, wrist, etc.). MethodsPrototype exposure monitors from RTI International and Columbia University were worn on the chest by a pilot cohort of adults while conducting an array of scripted activities (all <10 METS), spanning common recumbent, sedentary, and ambulatory activity categories. Referee Wocket accelerometers that were placed at various body locations allowed comparison with the chest-located exposure sensor accelerometers. An Oxycon Mobile mask was used to measure oral-nasal ventilation volumes in-situ. For the subset of participants with complete data (n = 22), linear regressions were constructed (processed accelerometric variable versus ventilation rate) for each participant and exposure monitor type, and Pearson correlations computed to compare across scenarios. ResultsTriaxial accelerometer data were demonstrated to be adequately sensitive indicators for predicting exposure monitor wearing compliance. Strong linear correlations (R values from 0.77 to 0.99) were observed for all participants for both exposure sensor accelerometer variables against ventilation volume for recumbent, sedentary, and ambulatory activities with MET values ˜<6. The RTI monitors mean R value of 0.91 was slightly higher than the Columbia monitors mean of 0.86 due to utilizing a 20 Hz data rate instead of a slower 1 Hz rate. A nominal mean regression slope was computed for the RTI system across participants and showed a modest RSD of +/-36.6%. Comparison of the correlation values of the exposure monitors with the Wocket accelerometers at various body locations showed statistically identical regressions for all sensors at alternate hip, ankle, upper arm, thigh, and pocket locations, but not for the Wocket accelerometer located at the dominant side wrist location (R = 0.57; p = 0.016). ConclusionsEven with a modest number of adult volunteers, the consistency and linearity of regression slopes for all subjects were very good with excellent within-person Pearson correlations for the accelerometer versus ventilation volume data. Computing accelerometric standard deviations allowed good sensitivity for compliance assessments even for sedentary activities. These pilot findings supported the hypothesis that a common linear regression is likely to be usable for a wider range of adults to predict ventilation volumes from accelerometry data over a range of low to moderate energy level activities. The predicted volumes would then allow real-time estimates of potential dose, enabling more robust panel studies. The poorer correlation in predicting ventilation rate for an accelerometer located on the wrist suggested that this location should not be considered for predictions of ventilation volume.

  9. Use of non-linear mixed-effects modelling and regression analysis to predict the number of somatic coliphages by plaque enumeration after 3 hours of incubation.

    PubMed

    Mendez, Javier; Monleon-Getino, Antonio; Jofre, Juan; Lucena, Francisco

    2017-10-01

    The present study aimed to establish the kinetics of the appearance of coliphage plaques using the double agar layer titration technique to evaluate the feasibility of using traditional coliphage plaque forming unit (PFU) enumeration as a rapid quantification method. Repeated measurements of the appearance of plaques of coliphages titrated according to ISO 10705-2 at different times were analysed using non-linear mixed-effects regression to determine the most suitable model of their appearance kinetics. Although this model is adequate, to simplify its applicability two linear models were developed to predict the numbers of coliphages reliably, using the PFU counts as determined by the ISO after only 3 hours of incubation. One linear model, when the number of plaques detected was between 4 and 26 PFU after 3 hours, had a linear fit of: (1.48 × Counts 3 h + 1.97); and the other, values >26 PFU, had a fit of (1.18 × Counts 3 h + 2.95). If the number of plaques detected was <4 PFU after 3 hours, we recommend incubation for (18 ± 3) hours. The study indicates that the traditional coliphage plating technique has a reasonable potential to provide results in a single working day without the need to invest in additional laboratory equipment.

  10. Weighted functional linear regression models for gene-based association analysis.

    PubMed

    Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I

    2018-01-01

    Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.

  11. A reliable and cost effective approach for radiographic monitoring in nutritional rickets

    PubMed Central

    Gupta, V; Sharma, V; Sinha, B; Samanta, S

    2014-01-01

    Objective: Radiological scoring is particularly useful in rickets, where pre-treatment radiographical findings can reflect the disease severity and can be used to monitor the improvement. However, there is only a single radiographic scoring system for rickets developed by Thacher and, to the best of our knowledge, no study has evaluated radiographic changes in rickets based on this scoring system apart from the one done by Thacher himself. The main objective of this study is to compare and analyse the pre-treatment and post-treatment radiographic parameters in nutritional rickets with the help of Thacher's scoring technique. Methods: 176 patients with nutritional rickets were given a single intramuscular injection of vitamin D (600 000 IU) along with oral calcium (50 mg kg−1) and vitamin D (400 IU per day) until radiological resolution and followed for 1 year. Pre- and post-treatment radiological parameters were compared and analysed statistically based on Thacher's scoring system. Results: Radiological resolution was complete by 6 months. Time for radiological resolution and initial radiological score were linearly associated on regression analysis. The distal ulna was the last to heal in most cases except when the initial score was 10, when distal femur was the last to heal. Conclusion: Thacher's scoring system can effectively monitor nutritional rickets. The formula derived through linear regression has prognostic significance. Advances in knowledge: The distal femur is a better indicator in radiologically severe rickets and when resolution is delayed. Thacher's scoring is very useful for monitoring of rickets. The formula derived through linear regression can predict the expected time for radiological resolution. PMID:24593231

  12. Time Series Analysis of Soil Radon Data Using Multiple Linear Regression and Artificial Neural Network in Seismic Precursory Studies

    NASA Astrophysics Data System (ADS)

    Singh, S.; Jaishi, H. P.; Tiwari, R. P.; Tiwari, R. C.

    2017-07-01

    This paper reports the analysis of soil radon data recorded in the seismic zone-V, located in the northeastern part of India (latitude 23.73N, longitude 92.73E). Continuous measurements of soil-gas emission along Chite fault in Mizoram (India) were carried out with the replacement of solid-state nuclear track detectors at weekly interval. The present study was done for the period from March 2013 to May 2015 using LR-115 Type II detectors, manufactured by Kodak Pathe, France. In order to reduce the influence of meteorological parameters, statistical analysis tools such as multiple linear regression and artificial neural network have been used. Decrease in radon concentration was recorded prior to some earthquakes that occurred during the observation period. Some false anomalies were also recorded which may be attributed to the ongoing crustal deformation which was not major enough to produce an earthquake.

  13. Compulsive buying: Earlier illicit drug use, impulse buying, depression, and adult ADHD symptoms.

    PubMed

    Brook, Judith S; Zhang, Chenshu; Brook, David W; Leukefeld, Carl G

    2015-08-30

    This longitudinal study examined the association between psychosocial antecedents, including illicit drug use, and adult compulsive buying (CB) across a 29-year time period from mean age 14 to mean age 43. Participants originally came from a community-based random sample of residents in two upstate New York counties. Multivariate linear regression analysis was used to study the relationship between the participant's earlier psychosocial antecedents and adult CB in the fifth decade of life. The results of the multivariate linear regression analyses showed that gender (female), earlier adult impulse buying (IB), depressive mood, illicit drug use, and concurrent ADHD symptoms were all significantly associated with adult CB at mean age 43. It is important that clinicians treating CB in adults should consider the role of drug use, symptoms of ADHD, IB, depression, and family factors in CB. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  14. Compulsive Buying: Earlier Illicit Drug Use, Impulse Buying, Depression, and Adult ADHD Symptoms

    PubMed Central

    Brook, Judith S.; Zhang, Chenshu; Brook, David W.; Leukefeld, Carl G.

    2015-01-01

    This longitudinal study examined the association between psychosocial antecedents, including illicit drug use, and adult compulsive buying (CB) across a 29-year time period from mean age 14 to mean age 43. Participants originally came from a community-based random sample of residents in two upstate New York counties. Multivariate linear regression analysis was used to study the relationship between the participant’s earlier psychosocial antecedents and adult CB in the fifth decade of life. The results of the multivariate linear regression analyses showed that gender (female), earlier adult impulse buying (IB), depressive mood, illicit drug use, and concurrent ADHD symptoms were all significantly associated with adult CB at mean age 43. It is important that clinicians treating CB in adults should consider the role of drug use, symptoms of ADHD, IB, depression, and family factors in CB. PMID:26165963

  15. Scoring and staging systems using cox linear regression modeling and recursive partitioning.

    PubMed

    Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H

    2006-01-01

    Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.

  16. Directionality volatility in electroencephalogram time series

    NASA Astrophysics Data System (ADS)

    Mansor, Mahayaudin M.; Green, David A.; Metcalfe, Andrew V.

    2016-06-01

    We compare time series of electroencephalograms (EEGs) from healthy volunteers with EEGs from subjects diagnosed with epilepsy. The EEG time series from the healthy group are recorded during awake state with their eyes open and eyes closed, and the records from subjects with epilepsy are taken from three different recording regions of pre-surgical diagnosis: hippocampal, epileptogenic and seizure zone. The comparisons for these 5 categories are in terms of deviations from linear time series models with constant variance Gaussian white noise error inputs. One feature investigated is directionality, and how this can be modelled by either non-linear threshold autoregressive models or non-Gaussian errors. A second feature is volatility, which is modelled by Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) processes. Other features include the proportion of variability accounted for by time series models, and the skewness and the kurtosis of the residuals. The results suggest these comparisons may have diagnostic potential for epilepsy and provide early warning of seizures.

  17. Comparison of Linear and Non-linear Regression Analysis to Determine Pulmonary Pressure in Hyperthyroidism.

    PubMed

    Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan

    2017-01-01

    This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.

  18. Evaluation and statistical judgement of neural responses to sinusoidal stimulation in cases with superimposed drift and noise.

    PubMed

    Jastreboff, P W

    1979-06-01

    Time histograms of neural responses evoked by sinuosidal stimulation often contain a slow drifting and an irregular noise which disturb Fourier analysis of these responses. Section 2 of this paper evaluates the extent to which a linear drift influences the Fourier analysis, and develops a combined Fourier and linear regression analysis for detecting and correcting for such a linear drift. Usefulness of this correcting method is demonstrated for the time histograms of actual eye movements and Purkinje cell discharges evoked by sinusoidal rotation of rabbits in the horizontal plane. In Sect. 3, the analysis of variance is adopted for estimating the probability of the random occurrence of the response curve extracted by Fourier analysis from noise. This method proved to be useful for avoiding false judgements as to whether the response curve was meaningful, particularly when the response was small relative to the contaminating noise.

  19. A field trial of ethyl hexanediol against Aedes dorsalis in Sonoma County, California.

    PubMed

    Rutledge, L C; Hooper, R L; Wirtz, R A; Gupta, R K

    1989-09-01

    The repellent ethyl hexanediol (2-ethyl-1,3-hexanediol) was tested against the mosquito Aedes dorsalis in a coastal salt marsh in California. The experimental design incorporated a linear regression model, sequential treatments and a proportional end point (95%) for protection time. The protection time of 0.10 mg/cm2 ethyl hexanediol was estimated at 0.8 h. This time is shorter than that obtained previously for deet (N,N-diethyl-3-methylbenzamide) against Ae. dorsalis (4.4 h).

  20. Early Student Support to Investigate the Role of Sea Ice Albedo Feedback in Sea Ice Predictions

    DTIC Science & Technology

    2015-09-30

    time periods: 1925-1960, 1970-2005, 2015-2050, and 2060 -2095. Model runs from the first two time periods had historical radiative forcing, whereas the...of the Arctic exhibits the relationship seen near the sea ice edge in the late 20th century. • Between 2015-2050 and 2060 -2095, there is a regime...1980). Ice-free summers are not found until 2060s . • From the linear regressions, air temperatures decrease in importance over time as good

  1. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    EPA Science Inventory

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  2. A simplified competition data analysis for radioligand specific activity determination.

    PubMed

    Venturino, A; Rivera, E S; Bergoc, R M; Caro, R A

    1990-01-01

    Non-linear regression and two-step linear fit methods were developed to determine the actual specific activity of 125I-ovine prolactin by radioreceptor self-displacement analysis. The experimental results obtained by the different methods are superposable. The non-linear regression method is considered to be the most adequate procedure to calculate the specific activity, but if its software is not available, the other described methods are also suitable.

  3. Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions

    PubMed Central

    Fernandes, Bruno J. T.; Roque, Alexandre

    2018-01-01

    Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366

  4. Electricity Consumption in the Industrial Sector of Jordan: Application of Multivariate Linear Regression and Adaptive Neuro-Fuzzy Techniques

    NASA Astrophysics Data System (ADS)

    Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.

    2009-08-01

    In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.

  5. Assessing the risk of bovine fasciolosis using linear regression analysis for the state of Rio Grande do Sul, Brazil.

    PubMed

    Silva, Ana Elisa Pereira; Freitas, Corina da Costa; Dutra, Luciano Vieira; Molento, Marcelo Beltrão

    2016-02-15

    Fasciola hepatica is the causative agent of fasciolosis, a disease that triggers a chronic inflammatory process in the liver affecting mainly ruminants and other animals including humans. In Brazil, F. hepatica occurs in larger numbers in the most Southern state of Rio Grande do Sul. The objective of this study was to estimate areas at risk using an eight-year (2002-2010) time series of climatic and environmental variables that best relate to the disease using a linear regression method to municipalities in the state of Rio Grande do Sul. The positivity index of the disease, which is the rate of infected animal per slaughtered animal, was divided into three risk classes: low, medium and high. The accuracy of the known sample classification on the confusion matrix for the low, medium and high rates produced by the estimated model presented values between 39 and 88% depending of the year. The regression analysis showed the importance of the time-based data for the construction of the model, considering the two variables of the previous year of the event (positivity index and maximum temperature). The generated data is important for epidemiological and parasite control studies mainly because F. hepatica is an infection that can last from months to years. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    NASA Astrophysics Data System (ADS)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  7. Time-varying trends of global vegetation activity

    NASA Astrophysics Data System (ADS)

    Pan, N.; Feng, X.; Fu, B.

    2016-12-01

    Vegetation plays an important role in regulating the energy change, water cycle and biochemical cycle in terrestrial ecosystems. Monitoring the dynamics of vegetation activity and understanding their driving factors have been an important issue in global change research. Normalized Difference Vegetation Index (NDVI), an indicator of vegetation activity, has been widely used in investigating vegetation changes at regional and global scales. Most studies utilized linear regression or piecewise linear regression approaches to obtain an averaged changing rate over a certain time span, with an implicit assumption that the trend didn't change over time during that period. However, no evidence shows that this assumption is right for the non-linear and non-stationary NDVI time series. In this study, we adopted the multidimensional ensemble empirical mode decomposition (MEEMD) method to extract the time-varying trends of NDVI from original signals without any a priori assumption of their functional form. Our results show that vegetation trends are spatially and temporally non-uniform during 1982-2013. Most vegetated area exhibited greening trends in the 1980s. Nevertheless, the area with greening trends decreased over time since the early 1990s, and the greening trends have stalled or even reversed in many places. Regions with browning trends were mainly located in southern low latitudes in the 1980s, whose area decreased before the middle 1990s and then increased at an accelerated rate. The greening-to-browning reversals were widespread across all continents except Oceania (43% of the vegetated areas), most of which happened after the middle 1990s. In contrast, the browning-to-greening reversals occurred in smaller area and earlier time. The area with monotonic greening and browning trends accounted for 33% and 5% of the vegetated area, respectively. By performing partial correlation analyses between NDVI and climatic elements (temperature, precipitation and cloud cover) and analyzing the MEEMD-extracted trends of these climatic elements, we discussed possible driving factors of the time-varying trends of NDVI in several specific regions where trend reversals occurred.

  8. Understanding Preprocedure Patient Flow in IR.

    PubMed

    Zafar, Abdul Mueed; Suri, Rajeev; Nguyen, Tran Khanh; Petrash, Carson Cope; Fazal, Zanira

    2016-08-01

    To quantify preprocedural patient flow in interventional radiology (IR) and to identify potential contributors to preprocedural delays. An administrative dataset was used to compute time intervals required for various preprocedural patient-flow processes. These time intervals were compared across on-time/delayed cases and inpatient/outpatient cases by Mann-Whitney U test. Spearman ρ was used to assess any correlation of the rank of a procedure on a given day and the procedure duration to the preprocedure time. A linear-regression model of preprocedure time was used to further explore potential contributing factors. Any identified reason(s) for delay were collated. P < .05 was considered statistically significant. Of the total 1,091 cases, 65.8% (n = 718) were delayed. Significantly more outpatient cases started late compared with inpatient cases (81.4% vs 45.0%; P < .001, χ(2) test). The multivariate linear regression model showed outpatient status, length of delay in arrival, and longer procedure times to be significantly associated with longer preprocedure times. Late arrival of patients (65.9%), unavailability of physicians (18.4%), and unavailability of procedure room (13.0%) were the three most frequently identified reasons for delay. The delay was multifactorial in 29.6% of cases (n = 213). Objective measurement of preprocedural IR patient flow demonstrated considerable waste and highlighted high-yield areas of possible improvement. A data-driven approach may aid efficient delivery of IR care. Copyright © 2016 SIR. Published by Elsevier Inc. All rights reserved.

  9. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    PubMed

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  10. Multi-linear regression of sea level in the south west Pacific as a first step towards local sea level projections

    NASA Astrophysics Data System (ADS)

    Kumar, Vandhna; Meyssignac, Benoit; Melet, Angélique; Ganachaud, Alexandre

    2017-04-01

    Rising sea levels are a critical concern in small island nations. The problem is especially serious in the western south Pacific, where the total sea level rise over the last 60 years is up to 3 times the global average. In this study, we attempt to reconstruct sea levels at selected sites in the region (Suva, Lautoka, Noumea - Fiji and New Caledonia) as a mutiple-linear regression of atmospheric and oceanic variables. We focus on interannual-to-decadal scale variability, and lower (including the global mean sea level rise) over the 1979-2014 period. Sea levels are taken from tide gauge records and the ORAS4 reanalysis dataset, and are expressed as a sum of steric and mass changes as a preliminary step. The key development in our methodology is using leading wind stress curl as a proxy for the thermosteric component. This is based on the knowledge that wind stress curl anomalies can modulate the thermocline depth and resultant sea levels via Rossby wave propagation. The analysis is primarily based on correlation between local sea level and selected predictors, the dominant one being wind stress curl. In the first step, proxy boxes for wind stress curl are determined via regions of highest correlation. The proportion of sea level explained via linear regression is then removed, leaving a residual. This residual is then correlated with other locally acting potential predictors: halosteric sea level, the zonal and meridional wind stress components, and sea surface temperature. The statistically significant predictors are used in a multi-linear regression function to simulate the observed sea level. The method is able to reproduce between 40 to 80% of the variance in observed sea level. Based on the skill of the model, it has high potential in sea level projection and downscaling studies.

  11. Stratospheric Ozone Trends and Variability as Seen by SCIAMACHY from 2002 to 2012

    NASA Technical Reports Server (NTRS)

    Gebhardt, C.; Rozanov, A.; Hommel, R.; Weber, M.; Bovensmann, H.; Burrows, J. P.; Degenstein, D.; Froidevaux, L.; Thompson, A. M.

    2014-01-01

    Vertical profiles of the rate of linear change (trend) in the altitude range 15-50 km are determined from decadal O3 time series obtained from SCIAMACHY/ENVISAT measurements in limb-viewing geometry. The trends are calculated by using a multivariate linear regression. Seasonal variations, the quasi-biennial oscillation, signatures of the solar cycle and the El Nino-Southern Oscillation are accounted for in the regression. The time range of trend calculation is August 2002-April 2012. A focus for analysis are the zonal bands of 20 deg N - 20 deg S (tropics), 60 - 50 deg N, and 50 - 60 deg S (midlatitudes). In the tropics, positive trends of up to 5% per decade between 20 and 30 km and negative trends of up to 10% per decade between 30 and 38 km are identified. Positive O3 trends of around 5% per decade are found in the upper stratosphere in the tropics and at midlatitudes. Comparisons between SCIAMACHY and EOS MLS show reasonable agreement both in the tropics and at midlatitudes for most altitudes. In the tropics, measurements from OSIRIS/Odin and SHADOZ are also analysed. These yield rates of linear change of O3 similar to those from SCIAMACHY. However, the trends from SCIAMACHY near 34 km in the tropics are larger than MLS and OSIRIS by a factor of around two.

  12. Alzheimer's Disease Detection by Pseudo Zernike Moment and Linear Regression Classification.

    PubMed

    Wang, Shui-Hua; Du, Sidan; Zhang, Yin; Phillips, Preetha; Wu, Le-Nan; Chen, Xian-Qing; Zhang, Yu-Dong

    2017-01-01

    This study presents an improved method based on "Gorji et al. Neuroscience. 2015" by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  13. A simple bias correction in linear regression for quantitative trait association under two-tail extreme selection.

    PubMed

    Kwan, Johnny S H; Kung, Annie W C; Sham, Pak C

    2011-09-01

    Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.

  14. Carotid Flow Time Test Performance for the Detection of Dehydration in Children With Diarrhea.

    PubMed

    Mackenzie, David C; Nasrin, Sabiha; Atika, Bita; Modi, Payal; Alam, Nur H; Levine, Adam C

    2018-06-01

    Unstructured clinical assessments of dehydration in children are inaccurate. Point-of-care ultrasound is a noninvasive diagnostic tool that can help evaluate the volume status; the corrected carotid artery flow time has been shown to predict volume depletion in adults. We sought to determine the ability of the corrected carotid artery flow time to identify dehydration in a population of children presenting with acute diarrhea in Dhaka, Bangladesh. Children presenting with acute diarrhea were recruited and rehydrated according to hospital protocols. The corrected carotid artery flow time was measured at the time of presentation. The percentage of weight change with rehydration was used to categorize each child's dehydration as severe (>9%), some (3%-9%), or none (<3%). A receiver operating characteristic curve was constructed to test the performance of the corrected carotid artery flow time for detecting severe dehydration. Linear regression was used to model the relationship between the corrected carotid artery flow time and percentage of dehydration. A total of 350 children (0-60 months) were enrolled. The mean corrected carotid artery flow time was 326 milliseconds (interquartile range, 295-351 milliseconds). The area under the receiver operating characteristic curve for the detection of severe dehydration was 0.51 (95% confidence interval, 0.42, 0.61). Linear regression modeling showed a weak association between the flow time and dehydration. The corrected carotid artery flow time was a poor predictor of severe dehydration in this population of children with diarrhea. © 2017 by the American Institute of Ultrasound in Medicine.

  15. A Common Mechanism for Resistance to Oxime Reactivation of Acetylcholinesterase Inhibited by Organophosphorus Compounds

    DTIC Science & Technology

    2013-01-01

    application of the Hammett equation with the constants rph in the chemistry of organophosphorus compounds, Russ. Chem. Rev. 38 (1969) 795–811. [13...of oximes and OP compounds and the ability of oximes to reactivate OP- inhibited AChE. Multiple linear regression equations were analyzed using...phosphonate pairs, 21 oxime/ phosphoramidate pairs and 12 oxime/phosphate pairs. The best linear regression equation resulting from multiple regression anal

  16. Regression analysis of sparse asynchronous longitudinal data

    PubMed Central

    Cao, Hongyuan; Zeng, Donglin; Fine, Jason P.

    2015-01-01

    Summary We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent responses and covariates are observed intermittently within subjects. Unlike with synchronous data, where the response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel-weighted estimating equations are proposed for generalized linear models with either time invariant or time-dependent coefficients under smoothness assumptions for the covariate processes which are similar to those for synchronous data. For models with either time invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal but converge at slower rates than those achieved with synchronous data. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to a naive application of methods for synchronous data based on an ad hoc last value carried forward approach. The practical utility of the methods is illustrated on data from a study on human immunodeficiency virus. PMID:26568699

  17. Econometric models of road use, accidents, and road investment decisions. Volume 2 : an econometric model of car ownership, road use, accidents, and their severity (Essay 3)

    DOT National Transportation Integrated Search

    1999-11-01

    Using a fairly large cross-section/time-series data base, covering all provinces of Norway and all months between January 1973 and December 1994, we estimate non-linear (Box-Cox) regression equations explaining aggregate car ownership, road use, seat...

  18. 77 FR 3147 - Approval and Promulgation of Air Quality Implementation Plans; Delaware, New Jersey, and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-01-23

    ... monitors with missing data. Maximum recorded values are substituted for the missing data. The resulting... which the incomplete site is missing data. The linear regression relationship is based on time periods... between the monitors is used to fill in missing data for the incomplete monitor, so that the normal data...

  19. Spatial regression methods capture prediction uncertainty in species distribution model projections through time

    Treesearch

    Alan K. Swanson; Solomon Z. Dobrowski; Andrew O. Finley; James H. Thorne; Michael K. Schwartz

    2013-01-01

    The uncertainty associated with species distribution model (SDM) projections is poorly characterized, despite its potential value to decision makers. Error estimates from most modelling techniques have been shown to be biased due to their failure to account for spatial autocorrelation (SAC) of residual error. Generalized linear mixed models (GLMM) have the ability to...

  20. Age, Sex, and Body Composition as Predictors of Children's Performance on Basic Motor Abilities and Health-Related Fitness Items.

    ERIC Educational Resources Information Center

    Pissanos, Becky W.; And Others

    1983-01-01

    Step-wise linear regressions were used to relate children's age, sex, and body composition to performance on basic motor abilities including balance, speed, agility, power, coordination, and reaction time, and to health-related fitness items including flexibility, muscle strength and endurance and cardiovascular functions. Eighty subjects were in…

  1. HOS network-based classification of power quality events via regression algorithms

    NASA Astrophysics Data System (ADS)

    Palomares Salas, José Carlos; González de la Rosa, Juan José; Sierra Fernández, José María; Pérez, Agustín Agüera

    2015-12-01

    This work compares seven regression algorithms implemented in artificial neural networks (ANNs) supported by 14 power-quality features, which are based in higher-order statistics. Combining time and frequency domain estimators to deal with non-stationary measurement sequences, the final goal of the system is the implementation in the future smart grid to guarantee compatibility between all equipment connected. The principal results are based in spectral kurtosis measurements, which easily adapt to the impulsive nature of the power quality events. These results verify that the proposed technique is capable of offering interesting results for power quality (PQ) disturbance classification. The best results are obtained using radial basis networks, generalized regression, and multilayer perceptron, mainly due to the non-linear nature of data.

  2. Comparison of Conventional and ANN Models for River Flow Forecasting

    NASA Astrophysics Data System (ADS)

    Jain, A.; Ganti, R.

    2011-12-01

    Hydrological models are useful in many water resources applications such as flood control, irrigation and drainage, hydro power generation, water supply, erosion and sediment control, etc. Estimates of runoff are needed in many water resources planning, design development, operation and maintenance activities. River flow is generally estimated using time series or rainfall-runoff models. Recently, soft artificial intelligence tools such as Artificial Neural Networks (ANNs) have become popular for research purposes but have not been extensively adopted in operational hydrological forecasts. There is a strong need to develop ANN models based on real catchment data and compare them with the conventional models. In this paper, a comparative study has been carried out for river flow forecasting using the conventional and ANN models. Among the conventional models, multiple linear, and non linear regression, and time series models of auto regressive (AR) type have been developed. Feed forward neural network model structure trained using the back propagation algorithm, a gradient search method, was adopted. The daily river flow data derived from Godavari Basin @ Polavaram, Andhra Pradesh, India have been employed to develop all the models included here. Two inputs, flows at two past time steps, (Q(t-1) and Q(t-2)) were selected using partial auto correlation analysis for forecasting flow at time t, Q(t). A wide range of error statistics have been used to evaluate the performance of all the models developed in this study. It has been found that the regression and AR models performed comparably, and the ANN model performed the best amongst all the models investigated in this study. It is concluded that ANN model should be adopted in real catchments for hydrological modeling and forecasting.

  3. Bone Mineral Density across a Range of Physical Activity Volumes: NHANES 2007–2010

    PubMed Central

    Whitfield, Geoffrey P.; Kohrt, Wendy M.; Pettee Gabriel, Kelley K.; Rahbar, Mohammad H.; Kohl, Harold W.

    2014-01-01

    Introduction The association between aerobic physical activity volume and bone mineral density (BMD) is not completely understood. The purpose of this study was to clarify the association between BMD and aerobic activity across a broad range of activity volumes, in particular volumes between those recommended in the 2008 Physical Activity Guidelines for Americans and those of trained endurance athletes. Methods Data from the 2007–2010 National Health and Nutrition Examination Survey were used to quantify the association between reported physical activity and BMD at the lumbar spine and proximal femur across the entire range of activity volumes reported by US adults. Participants were categorized into multiples of the minimum guideline-recommended volume based on reported moderate and vigorous intensity leisure activity. Lumbar and proximal femur BMD was assessed with dual-energy x-ray absorptiometry. Results Among women, multivariable-adjusted linear regression analyses revealed no significant differences in lumbar BMD across activity categories, while proximal femur BMD was significantly higher among those who exceeded guidelines by 2–4 times than those who reported no activity. Among men, multivariable-adjusted BMD at both sites neared its highest values among those who exceeded guidelines by at least 4 times and was not progressively higher with additional activity. Logistic regression estimating the odds of low BMD generally echoed the linear regression results. Conclusion The association between physical activity volume and BMD is complex. Among women, exceeding guidelines by 2–4 times may be important for maximizing BMD at the proximal femur, while among men, exceeding guidelines by 4+ times may be beneficial for lumbar and proximal femur BMD. PMID:24870584

  4. Mapping Regional Impervious Surface Distribution from Night Time Light: The Variability across Global Cities

    NASA Astrophysics Data System (ADS)

    Lin, M.; Yang, Z.; Park, H.; Qian, S.; Chen, J.; Fan, P.

    2017-12-01

    Impervious surface area (ISA) has become an important indicator for studying urban environments, but mapping ISA at the regional or global scale is still challenging due to the complexity of impervious surface features. The Defense Meteorological Satellite Program's Operational Linescan System (DMSP-OLS) nighttime light data is (NTL) and Resolution Imaging Spectroradiometer (MODIS) are the major remote sensing data source for regional ISA mapping. A single regression relationship between fractional ISA and NTL or various index derived based on NTL and MODIS vegetation index (NDVI) data was established in many previous studies for regional ISA mapping. However, due to the varying geographical, climatic, and socio-economic characteristics of different cities, the same regression relationship may vary significantly across different cities in the same region in terms of both fitting performance (i.e. R2) and the rate of change (Slope). In this study, we examined the regression relationship between fractional ISA and Vegetation Adjusted Nighttime light Urban Index (VANUI) for 120 randomly selected cities around the world with a multilevel regression model. We found that indeed there is substantial variability of both the R2 (0.68±0.29) and slopes (0.64±0.40) among individual regressions, which suggests that multilevel/hierarchical models are needed for accuracy improvement of future regional ISA mapping .Further analysis also let us find the this substantial variability are affected by climate conditions, socio-economic status, and urban spatial structures. However, all these effects are nonlinear rather than linear, thus could not modeled explicitly in multilevel linear regression models.

  5. Specialization Agreements in the Council for Mutual Economic Assistance

    DTIC Science & Technology

    1988-02-01

    proportions to stabilize variance (S. Weisberg, Applied Linear Regression , 2nd ed., John Wiley & Sons, New York, 1985, p. 134). If the dependent...27, 1986, p. 3. Weisberg, S., Applied Linear Regression , 2nd ed., John Wiley & Sons, New York, 1985, p. 134. Wiles, P. J., Communist International

  6. Radio Propagation Prediction Software for Complex Mixed Path Physical Channels

    DTIC Science & Technology

    2006-08-14

    63 4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz 69 4.4.7. Projected Scaling to...4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz In order to construct a comprehensive numerical algorithm capable of

  7. Data Transformations for Inference with Linear Regression: Clarifications and Recommendations

    ERIC Educational Resources Information Center

    Pek, Jolynn; Wong, Octavia; Wong, C. M.

    2017-01-01

    Data transformations have been promoted as a popular and easy-to-implement remedy to address the assumption of normally distributed errors (in the population) in linear regression. However, the application of data transformations introduces non-ignorable complexities which should be fully appreciated before their implementation. This paper adds to…

  8. USING LINEAR AND POLYNOMIAL MODELS TO EXAMINE THE ENVIRONMENTAL STABILITY OF VIRUSES

    EPA Science Inventory

    The article presents the development of model equations for describing the fate of viral infectivity in environmental samples. Most of the models were based upon the use of a two-step linear regression approach. The first step employs regression of log base 10 transformed viral t...

  9. Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis

    ERIC Educational Resources Information Center

    Camilleri, Liberato; Cefai, Carmel

    2013-01-01

    Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…

  10. Mutual information estimation for irregularly sampled time series

    NASA Astrophysics Data System (ADS)

    Rehfeld, K.; Marwan, N.; Heitzig, J.; Kurths, J.

    2012-04-01

    For the automated, objective and joint analysis of time series, similarity measures are crucial. Used in the analysis of climate records, they allow for a complimentary, unbiased view onto sparse datasets. The irregular sampling of many of these time series, however, makes it necessary to either perform signal reconstruction (e.g. interpolation) or to develop and use adapted measures. Standard linear interpolation comes with an inevitable loss of information and bias effects. We have recently developed a Gaussian kernel-based correlation algorithm with which the interpolation error can be substantially lowered, but this would not work should the functional relationship in a bivariate setting be non-linear. We therefore propose an algorithm to estimate lagged auto and cross mutual information from irregularly sampled time series. We have extended the standard and adaptive binning histogram estimators and use Gaussian distributed weights in the estimation of the (joint) probabilities. To test our method we have simulated linear and nonlinear auto-regressive processes with Gamma-distributed inter-sampling intervals. We have then performed a sensitivity analysis for the estimation of actual coupling length, the lag of coupling and the decorrelation time in the synthetic time series and contrast our results to the performance of a signal reconstruction scheme. Finally we applied our estimator to speleothem records. We compare the estimated memory (or decorrelation time) to that from a least-squares estimator based on fitting an auto-regressive process of order 1. The calculated (cross) mutual information results are compared for the different estimators (standard or adaptive binning) and contrasted with results from signal reconstruction. We find that the kernel-based estimator has a significantly lower root mean square error and less systematic sampling bias than the interpolation-based method. It is possible that these encouraging results could be further improved by using non-histogram mutual information estimators, like k-Nearest Neighbor or Kernel-Density estimators, but for short (<1000 points) and irregularly sampled datasets the proposed algorithm is already a great improvement.

  11. Simple and multiple linear regression: sample size considerations.

    PubMed

    Hanley, James A

    2016-11-01

    The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression

    PubMed Central

    Jiang, Feng; Han, Ji-zhong

    2018-01-01

    Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods. PMID:29623088

  13. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression.

    PubMed

    Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong

    2018-01-01

    Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.

  14. SU-F-R-20: Image Texture Features Correlate with Time to Local Failure in Lung SBRT Patients

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andrews, M; Abazeed, M; Woody, N

    Purpose: To explore possible correlation between CT image-based texture and histogram features and time-to-local-failure in early stage non-small cell lung cancer (NSCLC) patients treated with stereotactic body radiotherapy (SBRT).Methods and Materials: From an IRB-approved lung SBRT registry for patients treated between 2009–2013 we selected 48 (20 male, 28 female) patients with local failure. Median patient age was 72.3±10.3 years. Mean time to local failure was 15 ± 7.1 months. Physician-contoured gross tumor volumes (GTV) on the planning CT images were processed and 3D gray-level co-occurrence matrix (GLCM) based texture and histogram features were calculated in Matlab. Data were exported tomore » R and a multiple linear regression model was used to examine the relationship between texture features and time-to-local-failure. Results: Multiple linear regression revealed that entropy (p=0.0233, multiple R2=0.60) from GLCM-based texture analysis and the standard deviation (p=0.0194, multiple R2=0.60) from the histogram-based features were statistically significantly correlated with the time-to-local-failure. Conclusion: Image-based texture analysis can be used to predict certain aspects of treatment outcomes of NSCLC patients treated with SBRT. We found entropy and standard deviation calculated for the GTV on the CT images displayed a statistically significant correlation with and time-to-local-failure in lung SBRT patients.« less

  15. [Quantitative relationship between gas chromatographic retention time and structural parameters of alkylphenols].

    PubMed

    Ruan, Xiaofang; Zhang, Ruisheng; Yao, Xiaojun; Liu, Mancang; Fan, Botao

    2007-03-01

    Alkylphenols are a group of permanent pollutants in the environment and could adversely disturb the human endocrine system. It is therefore important to effectively separate and measure the alkylphenols. To guide the chromatographic analysis of these compounds in practice, the development of quantitative relationship between the molecular structure and the retention time of alkylphenols becomes necessary. In this study, topological, constitutional, geometrical, electrostatic and quantum-chemical descriptors of 44 alkylphenols were calculated using a software, CODESSA, and these descriptors were pre-selected using the heuristic method. As a result, three-descriptor linear model (LM) was developed to describe the relationship between the molecular structure and the retention time of alkylphenols. Meanwhile, the non-linear regression model was also developed based on support vector machine (SVM) using the same three descriptors. The correlation coefficient (R(2)) for the LM and SVM was 0.98 and 0. 92, and the corresponding root-mean-square error was 0. 99 and 2. 77, respectively. By comparing the stability and prediction ability of the two models, it was found that the linear model was a better method for describing the quantitative relationship between the retention time of alkylphenols and the molecular structure. The results obtained suggested that the linear model could be applied for the chromatographic analysis of alkylphenols with known molecular structural parameters.

  16. Analysis of Binary Adherence Data in the Setting of Polypharmacy: A Comparison of Different Approaches

    PubMed Central

    Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.

    2009-01-01

    Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358

  17. Genetic Programming Transforms in Linear Regression Situations

    NASA Astrophysics Data System (ADS)

    Castillo, Flor; Kordon, Arthur; Villa, Carlos

    The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.

  18. Non-linear auto-regressive models for cross-frequency coupling in neural time series

    PubMed Central

    Tallot, Lucille; Grabot, Laetitia; Doyère, Valérie; Grenier, Yves; Gramfort, Alexandre

    2017-01-01

    We address the issue of reliably detecting and quantifying cross-frequency coupling (CFC) in neural time series. Based on non-linear auto-regressive models, the proposed method provides a generative and parametric model of the time-varying spectral content of the signals. As this method models the entire spectrum simultaneously, it avoids the pitfalls related to incorrect filtering or the use of the Hilbert transform on wide-band signals. As the model is probabilistic, it also provides a score of the model “goodness of fit” via the likelihood, enabling easy and legitimate model selection and parameter comparison; this data-driven feature is unique to our model-based approach. Using three datasets obtained with invasive neurophysiological recordings in humans and rodents, we demonstrate that these models are able to replicate previous results obtained with other metrics, but also reveal new insights such as the influence of the amplitude of the slow oscillation. Using simulations, we demonstrate that our parametric method can reveal neural couplings with shorter signals than non-parametric methods. We also show how the likelihood can be used to find optimal filtering parameters, suggesting new properties on the spectrum of the driving signal, but also to estimate the optimal delay between the coupled signals, enabling a directionality estimation in the coupling. PMID:29227989

  19. Estimating mono- and bi-phasic regression parameters using a mixture piecewise linear Bayesian hierarchical model

    PubMed Central

    Zhao, Rui; Catalano, Paul; DeGruttola, Victor G.; Michor, Franziska

    2017-01-01

    The dynamics of tumor burden, secreted proteins or other biomarkers over time, is often used to evaluate the effectiveness of therapy and to predict outcomes for patients. Many methods have been proposed to investigate longitudinal trends to better characterize patients and to understand disease progression. However, most approaches assume a homogeneous patient population and a uniform response trajectory over time and across patients. Here, we present a mixture piecewise linear Bayesian hierarchical model, which takes into account both population heterogeneity and nonlinear relationships between biomarkers and time. Simulation results show that our method was able to classify subjects according to their patterns of treatment response with greater than 80% accuracy in the three scenarios tested. We then applied our model to a large randomized controlled phase III clinical trial of multiple myeloma patients. Analysis results suggest that the longitudinal tumor burden trajectories in multiple myeloma patients are heterogeneous and nonlinear, even among patients assigned to the same treatment cohort. In addition, between cohorts, there are distinct differences in terms of the regression parameters and the distributions among categories in the mixture. Those results imply that longitudinal data from clinical trials may harbor unobserved subgroups and nonlinear relationships; accounting for both may be important for analyzing longitudinal data. PMID:28723910

  20. Agreement evaluation of AVHRR and MODIS 16-day composite NDVI data sets

    USGS Publications Warehouse

    Ji, Lei; Gallo, Kevin P.; Eidenshink, Jeffery C.; Dwyer, John L.

    2008-01-01

    Satellite-derived normalized difference vegetation index (NDVI) data have been used extensively to detect and monitor vegetation conditions at regional and global levels. A combination of NDVI data sets derived from AVHRR and MODIS can be used to construct a long NDVI time series that may also be extended to VIIRS. Comparative analysis of NDVI data derived from AVHRR and MODIS is critical to understanding the data continuity through the time series. In this study, the AVHRR and MODIS 16-day composite NDVI products were compared using regression and agreement analysis methods. The analysis shows a high agreement between the AVHRR-NDVI and MODIS-NDVI observed from 2002 and 2003 for the conterminous United States, but the difference between the two data sets is appreciable. Twenty per cent of the total difference between the two data sets is due to systematic difference, with the remainder due to unsystematic difference. The systematic difference can be eliminated with a linear regression-based transformation between two data sets, and the unsystematic difference can be reduced partially by applying spatial filters to the data. We conclude that the continuity of NDVI time series from AVHRR to MODIS is satisfactory, but a linear transformation between the two sets is recommended.

  1. Predicting Retention Times of Naturally Occurring Phenolic Compounds in Reversed-Phase Liquid Chromatography: A Quantitative Structure-Retention Relationship (QSRR) Approach

    PubMed Central

    Akbar, Jamshed; Iqbal, Shahid; Batool, Fozia; Karim, Abdul; Chan, Kim Wei

    2012-01-01

    Quantitative structure-retention relationships (QSRRs) have successfully been developed for naturally occurring phenolic compounds in a reversed-phase liquid chromatographic (RPLC) system. A total of 1519 descriptors were calculated from the optimized structures of the molecules using MOPAC2009 and DRAGON softwares. The data set of 39 molecules was divided into training and external validation sets. For feature selection and mapping we used step-wise multiple linear regression (SMLR), unsupervised forward selection followed by step-wise multiple linear regression (UFS-SMLR) and artificial neural networks (ANN). Stable and robust models with significant predictive abilities in terms of validation statistics were obtained with negation of any chance correlation. ANN models were found better than remaining two approaches. HNar, IDM, Mp, GATS2v, DISP and 3D-MoRSE (signals 22, 28 and 32) descriptors based on van der Waals volume, electronegativity, mass and polarizability, at atomic level, were found to have significant effects on the retention times. The possible implications of these descriptors in RPLC have been discussed. All the models are proven to be quite able to predict the retention times of phenolic compounds and have shown remarkable validation, robustness, stability and predictive performance. PMID:23203132

  2. Stone volume is best predictor of operative time required in retrograde intrarenal surgery for renal calculi: implications for surgical planning and quality improvement.

    PubMed

    Sorokin, Igor; Cardona-Grau, Diana K; Rehfuss, Alexandra; Birney, Alan; Stavrakis, Costas; Leinwand, Gabriel; Herr, Allen; Feustel, Paul J; White, Mark D

    2016-11-01

    Retrograde intrarenal surgery (RIRS) is highly successful at eliminating renal stones of various sizes and compositions. As urologists are taking on more complex procedures using RIRS, this has led to an increase in operative (OR) times. Our objective was to determine the best predictor of OR time in patients undergoing RIRS. We retrospectively reviewed the records of patients undergoing unilateral RIRS for solitary stones over a 10 year time span. Stones were fragmented and actively extracted using a basket. Variables potentially affecting OR time such as patient age, sex, BMI, lower pole stone location, volume, Hounsfield units (HU), composition, ureteral access sheath (UAS) use, and pre-operative stenting were collected. Multivariable linear and stepwise regression was used to evaluate the predictors of OR time. There were 118 patients that met inclusion criteria. The median stone volume was 282.6 mm 3 (IQR 150.7-644.7) and the mean OR time was 50 min (±25.9 SD). On univariate linear regression, stone volume had a moderate correlation with OR time (y = 0.022x + 38.2, r 2  = 0.363, p < 0.01). On multivariable stepwise regression, stone volume had the strongest impact on OR time, increasing time by 2.0 min for each 100 mm 3 increase in stone volume (p < 0.001). UAS added 13.5 (SE 3.9, p = 0.001) minutes and renal lower pole location added 9 min (SE 4.3, p = 0.03) in each case they were used. Pre-operative stenting, HU, calcium oxalate stone composition, sex, and age had no significant effect on OR time. Amongst the main stone factors in RIRS, stone volume has the strongest impact on operative time. This can be used to predict the length of the procedure by roughly adding 2 min per 100 mm 3 increase in stone volume.

  3. The effects of climate change on harp seals (Pagophilus groenlandicus).

    PubMed

    Johnston, David W; Bowers, Matthew T; Friedlaender, Ari S; Lavigne, David M

    2012-01-01

    Harp seals (Pagophilus groenlandicus) have evolved life history strategies to exploit seasonal sea ice as a breeding platform. As such, individuals are prepared to deal with fluctuations in the quantity and quality of ice in their breeding areas. It remains unclear, however, how shifts in climate may affect seal populations. The present study assesses the effects of climate change on harp seals through three linked analyses. First, we tested the effects of short-term climate variability on young-of-the year harp seal mortality using a linear regression of sea ice cover in the Gulf of St. Lawrence against stranding rates of dead harp seals in the region during 1992 to 2010. A similar regression of stranding rates and North Atlantic Oscillation (NAO) index values was also conducted. These analyses revealed negative correlations between both ice cover and NAO conditions and seal mortality, indicating that lighter ice cover and lower NAO values result in higher mortality. A retrospective cross-correlation analysis of NAO conditions and sea ice cover from 1978 to 2011 revealed that NAO-related changes in sea ice may have contributed to the depletion of seals on the east coast of Canada during 1950 to 1972, and to their recovery during 1973 to 2000. This historical retrospective also reveals opposite links between neonatal mortality in harp seals in the Northeast Atlantic and NAO phase. Finally, an assessment of the long-term trends in sea ice cover in the breeding regions of harp seals across the entire North Atlantic during 1979 through 2011 using multiple linear regression models and mixed effects linear regression models revealed that sea ice cover in all harp seal breeding regions has been declining by as much as 6 percent per decade over the time series of available satellite data.

  4. The Effects of Climate Change on Harp Seals (Pagophilus groenlandicus)

    PubMed Central

    Johnston, David W.; Bowers, Matthew T.; Friedlaender, Ari S.; Lavigne, David M.

    2012-01-01

    Harp seals (Pagophilus groenlandicus) have evolved life history strategies to exploit seasonal sea ice as a breeding platform. As such, individuals are prepared to deal with fluctuations in the quantity and quality of ice in their breeding areas. It remains unclear, however, how shifts in climate may affect seal populations. The present study assesses the effects of climate change on harp seals through three linked analyses. First, we tested the effects of short-term climate variability on young-of-the year harp seal mortality using a linear regression of sea ice cover in the Gulf of St. Lawrence against stranding rates of dead harp seals in the region during 1992 to 2010. A similar regression of stranding rates and North Atlantic Oscillation (NAO) index values was also conducted. These analyses revealed negative correlations between both ice cover and NAO conditions and seal mortality, indicating that lighter ice cover and lower NAO values result in higher mortality. A retrospective cross-correlation analysis of NAO conditions and sea ice cover from 1978 to 2011 revealed that NAO-related changes in sea ice may have contributed to the depletion of seals on the east coast of Canada during 1950 to 1972, and to their recovery during 1973 to 2000. This historical retrospective also reveals opposite links between neonatal mortality in harp seals in the Northeast Atlantic and NAO phase. Finally, an assessment of the long-term trends in sea ice cover in the breeding regions of harp seals across the entire North Atlantic during 1979 through 2011 using multiple linear regression models and mixed effects linear regression models revealed that sea ice cover in all harp seal breeding regions has been declining by as much as 6 percent per decade over the time series of available satellite data. PMID:22238591

  5. Naval Research Logistics Quarterly. Volume 28. Number 3,

    DTIC Science & Technology

    1981-09-01

    denotes component-wise maximum. f has antone (isotone) differences on C x D if for cl < c2 and d, < d2, NAVAL RESEARCH LOGISTICS QUARTERLY VOL. 28...or negative correlations and linear or nonlinear regressions. Given are the mo- ments to order two and, for special cases, (he regression function and...data sets. We designate this bnb distribution as G - B - N(a, 0, v). The distribution admits only of positive correlation and linear regressions

  6. Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

    NASA Astrophysics Data System (ADS)

    Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

    2017-12-01

    The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.

  7. Spectral-Spatial Shared Linear Regression for Hyperspectral Image Classification.

    PubMed

    Haoliang Yuan; Yuan Yan Tang

    2017-04-01

    Classification of the pixels in hyperspectral image (HSI) is an important task and has been popularly applied in many practical applications. Its major challenge is the high-dimensional small-sized problem. To deal with this problem, lots of subspace learning (SL) methods are developed to reduce the dimension of the pixels while preserving the important discriminant information. Motivated by ridge linear regression (RLR) framework for SL, we propose a spectral-spatial shared linear regression method (SSSLR) for extracting the feature representation. Comparing with RLR, our proposed SSSLR has the following two advantages. First, we utilize a convex set to explore the spatial structure for computing the linear projection matrix. Second, we utilize a shared structure learning model, which is formed by original data space and a hidden feature space, to learn a more discriminant linear projection matrix for classification. To optimize our proposed method, an efficient iterative algorithm is proposed. Experimental results on two popular HSI data sets, i.e., Indian Pines and Salinas demonstrate that our proposed methods outperform many SL methods.

  8. qFeature

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2015-09-14

    This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.

  9. On the Stationarity of Multiple Autoregressive Approximants: Theory and Algorithms

    DTIC Science & Technology

    1976-08-01

    a I (3.4) Hannan and Terrell (1972) consider problems of a similar nature. Efficient estimates A(1),... , A(p) , and i of A(1)... ,A(p) and...34Autoregressive model fitting for control, Ann . Inst. Statist. Math., 23, 163-180. Hannan, E. J. (1970), Multiple Time Series, New York, John Wiley...Hannan, E. J. and Terrell , R. D. (1972), "Time series regression with linear constraints, " International Economic Review, 13, 189-200. Masani, P

  10. Identifying Factors That Predict Promotion Time to E-4 and Re-Enlistment Eligibility for U.S. Marine Corps Field Radio Operators

    DTIC Science & Technology

    2014-12-01

    Primary Military Occupational Specialty PRO Proficiency Q-Q Quantile - Quantile RSS Residual Sum of Squares SI Shop Information T&R Training and...construct multivariate linear regression models to estimate Marines’ Computed Tier Score and time to achieve E-4 based on their individual personal...Science (GS) score, ASVAB Mathematics Knowledge (MK) score, ASVAB Paragraph Comprehension (PC) score, weight , and whether a Marine receives a weight

  11. Simple linear and multivariate regression models.

    PubMed

    Rodríguez del Águila, M M; Benítez-Parejo, N

    2011-01-01

    In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.

  12. Nomogram to Predict Graft Thickness in Descemet Stripping Automated Endothelial Keratoplasty: An Eye Bank Study.

    PubMed

    Bae, Steven S; Menninga, Isaac; Hoshino, Richard; Humphreys, Christine; Chan, Clara C

    2018-06-01

    The purpose of this study was to develop a nomogram to predict postcut thickness of corneal grafts prepared at an eye bank for Descemet stripping automated endothelial keratoplasty (DSAEK). Retrospective chart review was performed of DSAEK graft preparations by 3 experienced technicians from April 2012 to May 2017 at the Eye Bank of Canada-Ontario Division. Variables collected included the following: donor demographics, death-to-preservation time, death-to-processing time, precut tissue thickness, postcut tissue thickness, microkeratome head size, endothelial cell count, cut technician, and rate of perforation. Linear regression models were generated for each microkeratome head size (300 and 350 μm). A total of 780 grafts were processed during the study period. Twelve preparation attempts resulted in perforation (1.5%) and were excluded. Mean precut tissue thickness was 510 ± 49 μm (range: 363-670 μm). Mean postcut tissue thickness was 114 ± 22 μm (range: 57-193 μm). Seventy-nine percent (608/768) of grafts were ≤130 μm. The linear regression models included precut thickness and donor age, which were able to predict the thickness to within 25 μm 80% of the time. We report a nomogram to predict thickness of DSAEK corneal grafts prepared in an eye bank setting, which was accurate to within 25 μm 80% of the time. Other eye banks could consider performing similar analyses.

  13. Optimization of isotherm models for pesticide sorption on biopolymer-nanoclay composite by error analysis.

    PubMed

    Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M

    2017-04-01

    A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Weighted SGD for ℓ p Regression with Randomized Preconditioning.

    PubMed

    Yang, Jiyan; Chow, Yin-Lam; Ré, Christopher; Mahoney, Michael W

    2016-01-01

    In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. SGD methods are easy to implement and applicable to a wide range of convex optimization problems. In contrast, RLA algorithms provide much stronger performance guarantees but are applicable to a narrower class of problems. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems-e.g., ℓ 2 and ℓ 1 regression problems. We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system.By rewriting a deterministic ℓ p regression problem as a stochastic optimization problem, we connect pwSGD to several existing ℓ p solvers including RLA methods with algorithmic leveraging (RLA for short).We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Such SGD convergence rates are superior to other related SGD algorithm such as the weighted randomized Kaczmarz algorithm.Particularly, when solving ℓ 1 regression with size n by d , pwSGD returns an approximate solution with ε relative error in the objective value in (log n ·nnz( A )+poly( d )/ ε 2 ) time. This complexity is uniformly better than that of RLA methods in terms of both ε and d when the problem is unconstrained. In the presence of constraints, pwSGD only has to solve a sequence of much simpler and smaller optimization problem over the same constraints. In general this is more efficient than solving the constrained subproblem required in RLA.For ℓ 2 regression, pwSGD returns an approximate solution with ε relative error in the objective value and the solution vector measured in prediction norm in (log n ·nnz( A )+poly( d ) log(1/ ε )/ ε ) time. We show that for unconstrained ℓ 2 regression, this complexity is comparable to that of RLA and is asymptotically better over several state-of-the-art solvers in the regime where the desired accuracy ε , high dimension n and low dimension d satisfy d ≥ 1/ ε and n ≥ d 2 / ε . We also provide lower bounds on the coreset complexity for more general regression problems, indicating that still new ideas will be needed to extend similar RLA preconditioning ideas to weighted SGD algorithms for more general regression problems. Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets, and the results are consistent with our theoretical findings and demonstrate that pwSGD converges to a medium-precision solution, e.g., ε = 10 -3 , more quickly.

  15. Weighted SGD for ℓp Regression with Randomized Preconditioning*

    PubMed Central

    Yang, Jiyan; Chow, Yin-Lam; Ré, Christopher; Mahoney, Michael W.

    2018-01-01

    In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. SGD methods are easy to implement and applicable to a wide range of convex optimization problems. In contrast, RLA algorithms provide much stronger performance guarantees but are applicable to a narrower class of problems. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems—e.g., ℓ2 and ℓ1 regression problems. We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system.By rewriting a deterministic ℓp regression problem as a stochastic optimization problem, we connect pwSGD to several existing ℓp solvers including RLA methods with algorithmic leveraging (RLA for short).We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Such SGD convergence rates are superior to other related SGD algorithm such as the weighted randomized Kaczmarz algorithm.Particularly, when solving ℓ1 regression with size n by d, pwSGD returns an approximate solution with ε relative error in the objective value in 𝒪(log n·nnz(A)+poly(d)/ε2) time. This complexity is uniformly better than that of RLA methods in terms of both ε and d when the problem is unconstrained. In the presence of constraints, pwSGD only has to solve a sequence of much simpler and smaller optimization problem over the same constraints. In general this is more efficient than solving the constrained subproblem required in RLA.For ℓ2 regression, pwSGD returns an approximate solution with ε relative error in the objective value and the solution vector measured in prediction norm in 𝒪(log n·nnz(A)+poly(d) log(1/ε)/ε) time. We show that for unconstrained ℓ2 regression, this complexity is comparable to that of RLA and is asymptotically better over several state-of-the-art solvers in the regime where the desired accuracy ε, high dimension n and low dimension d satisfy d ≥ 1/ε and n ≥ d2/ε. We also provide lower bounds on the coreset complexity for more general regression problems, indicating that still new ideas will be needed to extend similar RLA preconditioning ideas to weighted SGD algorithms for more general regression problems. Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets, and the results are consistent with our theoretical findings and demonstrate that pwSGD converges to a medium-precision solution, e.g., ε = 10−3, more quickly. PMID:29782626

  16. A comparison of several methods of solving nonlinear regression groundwater flow problems

    USGS Publications Warehouse

    Cooley, Richard L.

    1985-01-01

    Computational efficiency and computer memory requirements for four methods of minimizing functions were compared for four test nonlinear-regression steady state groundwater flow problems. The fastest methods were the Marquardt and quasi-linearization methods, which required almost identical computer times and numbers of iterations; the next fastest was the quasi-Newton method, and last was the Fletcher-Reeves method, which did not converge in 100 iterations for two of the problems. The fastest method per iteration was the Fletcher-Reeves method, and this was followed closely by the quasi-Newton method. The Marquardt and quasi-linearization methods were slower. For all four methods the speed per iteration was directly related to the number of parameters in the model. However, this effect was much more pronounced for the Marquardt and quasi-linearization methods than for the other two. Hence the quasi-Newton (and perhaps Fletcher-Reeves) method might be more efficient than either the Marquardt or quasi-linearization methods if the number of parameters in a particular model were large, although this remains to be proven. The Marquardt method required somewhat less central memory than the quasi-linearization metilod for three of the four problems. For all four problems the quasi-Newton method required roughly two thirds to three quarters of the memory required by the Marquardt method, and the Fletcher-Reeves method required slightly less memory than the quasi-Newton method. Memory requirements were not excessive for any of the four methods.

  17. London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure

    PubMed Central

    Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith

    2017-01-01

    Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343

  18. Comparison of Total Solar Irradiance with NASA/NSO Spectromagnetograph Data in Solar Cycles 22 and 23

    NASA Technical Reports Server (NTRS)

    Jones, Harrison P.; Branston, Detrick D.; Jones, Patricia B.; Popescu, Miruna D.

    2002-01-01

    An earlier study compared NASA/NSO Spectromagnetograph (SPM) data with spacecraft measurements of total solar irradiance (TSI) variations over a 1.5 year period in the declining phase of solar cycle 22. This paper extends the analysis to an eight-year period which also spans the rising and early maximum phases of cycle 23. The conclusions of the earlier work appear to be robust: three factors (sunspots, strong unipolar regions, and strong mixed polarity regions) describe most of the variation in the SPM record, but only the first two are associated with TSI. Additionally, the residuals of a linear multiple regression of TSI against SPM observations over the entire eight-year period show an unexplained, increasing, linear time variation with a rate of about 0.05 W m(exp -2) per year. Separate regressions for the periods before and after 1996 January 01 show no unexplained trends but differ substantially in regression parameters. This behavior may reflect a solar source of TSI variations beyond sunspots and faculae but more plausibly results from uncompensated non-solar effects in one or both of the TSI and SPM data sets.

  19. Individual differences in long-range time representation.

    PubMed

    Agostino, Camila S; Caetano, Marcelo S; Balci, Fuat; Claessens, Peter M E; Zana, Yossi

    2017-04-01

    On the basis of experimental data, long-range time representation has been proposed to follow a highly compressed power function, which has been hypothesized to explain the time inconsistency found in financial discount rate preferences. The aim of this study was to evaluate how well linear and power function models explain empirical data from individual participants tested in different procedural settings. The line paradigm was used in five different procedural variations with 35 adult participants. Data aggregated over the participants showed that fitted linear functions explained more than 98% of the variance in all procedures. A linear regression fit also outperformed a power model fit for the aggregated data. An individual-participant-based analysis showed better fits of a linear model to the data of 14 participants; better fits of a power function with an exponent β > 1 to the data of 12 participants; and better fits of a power function with β < 1 to the data of the remaining nine participants. Of the 35 volunteers, the null hypothesis β = 1 was rejected for 20. The dispersion of the individual β values was approximated well by a normal distribution. These results suggest that, on average, humans perceive long-range time intervals not in a highly compressed, biased manner, but rather in a linear pattern. However, individuals differ considerably in their subjective time scales. This contribution sheds new light on the average and individual psychophysical functions of long-range time representation, and suggests that any attribution of deviation from exponential discount rates in intertemporal choice to the compressed nature of subjective time must entail the characterization of subjective time on an individual-participant basis.

  20. Using Parametric Cost Models to Estimate Engineering and Installation Costs of Selected Electronic Communications Systems

    DTIC Science & Technology

    1994-09-01

    Institute of Technology, Wright- Patterson AFB OH, January 1994. 4. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 5...Technology, Wright-Patterson AFB OH 5 April 1994. 29. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 30. Office of

  1. An Evaluation of the Automated Cost Estimating Integrated Tools (ACEIT) System

    DTIC Science & Technology

    1989-09-01

    residual and it is described as the residual divided by its standard deviation (13:App A,17). Neter, Wasserman, and Kutner, in Applied Linear Regression Models...others. Applied Linear Regression Models. Homewood IL: Irwin, 1983. 19. Raduchel, William J. "A Professional’s Perspective on User-Friendliness," Byte

  2. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    ERIC Educational Resources Information Center

    Cooper, Paul D.

    2010-01-01

    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  3. Conjoint Analysis: A Study of the Effects of Using Person Variables.

    ERIC Educational Resources Information Center

    Fraas, John W.; Newman, Isadore

    Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…

  4. Fitting program for linear regressions according to Mahon (1996)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Trappitsch, Reto G.

    2018-01-09

    This program takes the users' Input data and fits a linear regression to it using the prescription presented by Mahon (1996). Compared to the commonly used York fit, this method has the correct prescription for measurement error propagation. This software should facilitate the proper fitting of measurements with a simple Interface.

  5. How Robust Is Linear Regression with Dummy Variables?

    ERIC Educational Resources Information Center

    Blankmeyer, Eric

    2006-01-01

    Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…

  6. Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method

    ERIC Educational Resources Information Center

    Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev

    2018-01-01

    The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…

  7. An Introduction to Graphical and Mathematical Methods for Detecting Heteroscedasticity in Linear Regression.

    ERIC Educational Resources Information Center

    Thompson, Russel L.

    Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are…

  8. On the null distribution of Bayes factors in linear regression

    USDA-ARS?s Scientific Manuscript database

    We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...

  9. Common pitfalls in statistical analysis: Linear regression analysis

    PubMed Central

    Aggarwal, Rakesh; Ranganathan, Priya

    2017-01-01

    In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis. PMID:28447022

  10. Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.

    PubMed

    Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo

    2015-08-01

    Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.

  11. Prediction of hourly PM2.5 using a space-time support vector regression model

    NASA Astrophysics Data System (ADS)

    Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang

    2018-05-01

    Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.

  12. The Digital Shoreline Analysis System (DSAS) Version 4.0 - An ArcGIS extension for calculating shoreline change

    USGS Publications Warehouse

    Thieler, E. Robert; Himmelstoss, Emily A.; Zichichi, Jessica L.; Ergul, Ayhan

    2009-01-01

    The Digital Shoreline Analysis System (DSAS) version 4.0 is a software extension to ESRI ArcGIS v.9.2 and above that enables a user to calculate shoreline rate-of-change statistics from multiple historic shoreline positions. A user-friendly interface of simple buttons and menus guides the user through the major steps of shoreline change analysis. Components of the extension and user guide include (1) instruction on the proper way to define a reference baseline for measurements, (2) automated and manual generation of measurement transects and metadata based on user-specified parameters, and (3) output of calculated rates of shoreline change and other statistical information. DSAS computes shoreline rates of change using four different methods: (1) endpoint rate, (2) simple linear regression, (3) weighted linear regression, and (4) least median of squares. The standard error, correlation coefficient, and confidence interval are also computed for the simple and weighted linear-regression methods. The results of all rate calculations are output to a table that can be linked to the transect file by a common attribute field. DSAS is intended to facilitate the shoreline change-calculation process and to provide rate-of-change information and the statistical data necessary to establish the reliability of the calculated results. The software is also suitable for any generic application that calculates positional change over time, such as assessing rates of change of glacier limits in sequential aerial photos, river edge boundaries, land-cover changes, and so on.

  13. Evaluation of linear regression techniques for atmospheric applications: the importance of appropriate weighting

    NASA Astrophysics Data System (ADS)

    Wu, Cheng; Zhen Yu, Jian

    2018-03-01

    Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.

  14. A novel simple QSAR model for the prediction of anti-HIV activity using multiple linear regression analysis.

    PubMed

    Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga

    2006-08-01

    A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.

  15. Uncertainty of streamwater solute fluxes in five contrasting headwater catchments including model uncertainty and natural variability (Invited)

    NASA Astrophysics Data System (ADS)

    Aulenbach, B. T.; Burns, D. A.; Shanley, J. B.; Yanai, R. D.; Bae, K.; Wild, A.; Yang, Y.; Dong, Y.

    2013-12-01

    There are many sources of uncertainty in estimates of streamwater solute flux. Flux is the product of discharge and concentration (summed over time), each of which has measurement uncertainty of its own. Discharge can be measured almost continuously, but concentrations are usually determined from discrete samples, which increases uncertainty dependent on sampling frequency and how concentrations are assigned for the periods between samples. Gaps between samples can be estimated by linear interpolation or by models that that use the relations between concentration and continuously measured or known variables such as discharge, season, temperature, and time. For this project, developed in cooperation with QUEST (Quantifying Uncertainty in Ecosystem Studies), we evaluated uncertainty for three flux estimation methods and three different sampling frequencies (monthly, weekly, and weekly plus event). The constituents investigated were dissolved NO3, Si, SO4, and dissolved organic carbon (DOC), solutes whose concentration dynamics exhibit strongly contrasting behavior. The evaluation was completed for a 10-year period at five small, forested watersheds in Georgia, New Hampshire, New York, Puerto Rico, and Vermont. Concentration regression models were developed for each solute at each of the three sampling frequencies for all five watersheds. Fluxes were then calculated using (1) a linear interpolation approach, (2) a regression-model method, and (3) the composite method - which combines the regression-model method for estimating concentrations and the linear interpolation method for correcting model residuals to the observed sample concentrations. We considered the best estimates of flux to be derived using the composite method at the highest sampling frequencies. We also evaluated the importance of sampling frequency and estimation method on flux estimate uncertainty; flux uncertainty was dependent on the variability characteristics of each solute and varied for different reporting periods (e.g. 10-year, study period vs. annually vs. monthly). The usefulness of the two regression model based flux estimation approaches was dependent upon the amount of variance in concentrations the regression models could explain. Our results can guide the development of optimal sampling strategies by weighing sampling frequency with improvements in uncertainty in stream flux estimates for solutes with particular characteristics of variability. The appropriate flux estimation method is dependent on a combination of sampling frequency and the strength of concentration regression models. Sites: Biscuit Brook (Frost Valley, NY), Hubbard Brook Experimental Forest and LTER (West Thornton, NH), Luquillo Experimental Forest and LTER (Luquillo, Puerto Rico), Panola Mountain (Stockbridge, GA), Sleepers River Research Watershed (Danville, VT)

  16. Comparing lagged linear correlation, lagged regression, Granger causality, and vector autoregression for uncovering associations in EHR data.

    PubMed

    Levine, Matthew E; Albers, David J; Hripcsak, George

    2016-01-01

    Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.

  17. Partitioning sources of variation in vertebrate species richness

    USGS Publications Warehouse

    Boone, R.B.; Krohn, W.B.

    2000-01-01

    Aim: To explore biogeographic patterns of terrestrial vertebrates in Maine, USA using techniques that would describe local and spatial correlations with the environment. Location: Maine, USA. Methods: We delineated the ranges within Maine (86,156 km2) of 275 species using literature and expert review. Ranges were combined into species richness maps, and compared to geomorphology, climate, and woody plant distributions. Methods were adapted that compared richness of all vertebrate classes to each environmental correlate, rather than assessing a single explanatory theory. We partitioned variation in species richness into components using tree and multiple linear regression. Methods were used that allowed for useful comparisons between tree and linear regression results. For both methods we partitioned variation into broad-scale (spatially autocorrelated) and fine-scale (spatially uncorrelated) explained and unexplained components. By partitioning variance, and using both tree and linear regression in analyses, we explored the degree of variation in species richness for each vertebrate group that Could be explained by the relative contribution of each environmental variable. Results: In tree regression, climate variation explained richness better (92% of mean deviance explained for all species) than woody plant variation (87%) and geomorphology (86%). Reptiles were highly correlated with environmental variation (93%), followed by mammals, amphibians, and birds (each with 84-82% deviance explained). In multiple linear regression, climate was most closely associated with total vertebrate richness (78%), followed by woody plants (67%) and geomorphology (56%). Again, reptiles were closely correlated with the environment (95%), followed by mammals (73%), amphibians (63%) and birds (57%). Main conclusions: Comparing variation explained using tree and multiple linear regression quantified the importance of nonlinear relationships and local interactions between species richness and environmental variation, identifying the importance of linear relationships between reptiles and the environment, and nonlinear relationships between birds and woody plants, for example. Conservation planners should capture climatic variation in broad-scale designs; temperatures may shift during climate change, but the underlying correlations between the environment and species richness will presumably remain.

  18. Revisiting the Impact of NCLB High-Stakes School Accountability, Capacity, and Resources: State NAEP 1990-2009 Reading and Math Achievement Gaps and Trends

    ERIC Educational Resources Information Center

    Lee, Jaekyung; Reeves, Todd

    2012-01-01

    This study examines the impact of high-stakes school accountability, capacity, and resources under NCLB on reading and math achievement outcomes through comparative interrupted time-series analyses of 1990-2009 NAEP state assessment data. Through hierarchical linear modeling latent variable regression with inverse probability of treatment…

  19. 78 FR 44070 - Approval and Promulgation of Air Quality Implementation Plans; Pennsylvania; Determinations of...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-07-23

    ... missing data. The linear regression relationship is based on time periods in which both monitors were... fill in missing data for the incomplete monitor, so that the normal data completeness requirement of 75 percent of data in each quarter of the three years is met. After the missing data for the site is filled...

  20. Spatial and temporal drivers of wildfire occurrence in the context of rural development in northern Wisconsin, USA

    Treesearch

    Brian R Miranda; Brian R Sturtevant; Susan I Stewart; Roger B. Hammer

    2012-01-01

    Most drivers underlying wildfire are dynamic, but at different spatial and temporal scales. We quantified temporal and spatial trends in wildfire patterns over two spatial extents in northern Wisconsin to identify drivers and their change through time. We used spatial point pattern analysis to quantify the spatial pattern of wildfire occurrences, and linear regression...

  1. Are Environmental Influences on Physical Activity Distinct for Urban, Suburban, and Rural Schools? A Multilevel Study among Secondary School Students in Ontario, Canada

    ERIC Educational Resources Information Center

    Hobin, Erin P.; Leatherdale, Scott; Manske, Steve; Dubin, Joel A.; Elliott, Susan; Veugelers, Paul

    2013-01-01

    Background: This study examined differences in students' time spent in physical activity (PA) across secondary schools in rural, suburban, and urban environments and identified the environment-level factors associated with these between school differences in students' PA. Methods: Multilevel linear regression analyses were used to examine the…

  2. A Case for Transforming the Criterion of a Predictive Validity Study

    ERIC Educational Resources Information Center

    Patterson, Brian F.; Kobrin, Jennifer L.

    2011-01-01

    This study presents a case for applying a transformation (Box and Cox, 1964) of the criterion used in predictive validity studies. The goals of the transformation were to better meet the assumptions of the linear regression model and to reduce the residual variance of fitted (i.e., predicted) values. Using data for the 2008 cohort of first-time,…

  3. Examination of the Arborsonic Decay Detector for Detecting Bacterial Wetwood in Red Oaks

    Treesearch

    Zicai Xu; Theodor D. Leininger; James G. Williams; Frank H. Tainter

    2000-01-01

    The Arborsonic Decay Detector (ADD; Fujikura Europe Limited, Wiltshire, England) was used to measure the time it took an ultrasound wave to cross 280 diameters in red oak trees with varying degrees of bacterial wetwood or heartwood decay. Linear regressions derived from the ADD readings of trees in Mississippi and South Carolina with wetwood and heartwood decay...

  4. A comparative look at sunspot cycles

    NASA Technical Reports Server (NTRS)

    Wilson, R. M.

    1984-01-01

    On the basis of cycles 8 through 20, spanning about 143 years, observations of sunspot number, smoothed sunspot number, and their temporal properties were used to compute means, standard deviations, ranges, and frequency of occurrence histograms for a number of sunspot cycle parameters. The resultant schematic sunspot cycle was contrasted with the mean sunspot cycle, obtained by averaging smoothed sunspot number as a function of time, tying all cycles (8 through 20) to their minimum occurence date. A relatively good approximation of the time variation of smoothed sunspot number for a given cycle is possible if sunspot cycles are regarded in terms of being either HIGH- or LOW-R(MAX) cycles or LONG- or SHORT-PERIOD cycles, especially the latter. Linear regression analyses were performed comparing late cycle parameters with early cycle parameters and solar cycle number. The early occurring cycle parameters can be used to estimate later occurring cycle parameters with relatively good success, based on cycle 21 as an example. The sunspot cycle record clearly shows that the trend for both R(MIN) and R(MAX) was toward decreasing value between cycles 8 through 14 and toward increasing value between cycles 14 through 20. Linear regression equations were also obtained for several measures of solar activity.

  5. An iteratively reweighted least-squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations

    NASA Astrophysics Data System (ADS)

    Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza

    2018-03-01

    In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.

  6. A combined M5P tree and hazard-based duration model for predicting urban freeway traffic accident durations.

    PubMed

    Lin, Lei; Wang, Qian; Sadek, Adel W

    2016-06-01

    The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. An evaluation of dynamic mutuality measurements and methods in cyclic time series

    NASA Astrophysics Data System (ADS)

    Xia, Xiaohua; Huang, Guitian; Duan, Na

    2010-12-01

    Several measurements and techniques have been developed to detect dynamic mutuality and synchronicity of time series in econometrics. This study aims to compare the performances of five methods, i.e., linear regression, dynamic correlation, Markov switching models, concordance index and recurrence quantification analysis, through numerical simulations. We evaluate the abilities of these methods to capture structure changing and cyclicity in time series and the findings of this paper would offer guidance to both academic and empirical researchers. Illustration examples are also provided to demonstrate the subtle differences of these techniques.

  8. Comparison between light scattering and gravimetric samplers for PM10 mass concentration in poultry and pig houses

    NASA Astrophysics Data System (ADS)

    Cambra-López, María; Winkel, Albert; Mosquera, Julio; Ogink, Nico W. M.; Aarnink, André J. A.

    2015-06-01

    The objective of this study was to compare co-located real-time light scattering devices and equivalent gravimetric samplers in poultry and pig houses for PM10 mass concentration, and to develop animal-specific calibration factors for light scattering samplers. These results will contribute to evaluate the comparability of different sampling instruments for PM10 concentrations. Paired DustTrak light scattering device (DustTrak aerosol monitor, TSI, U.S.) and PM10 gravimetric cyclone sampler were used for measuring PM10 mass concentrations during 24 h periods (from noon to noon) inside animal houses. Sampling was conducted in 32 animal houses in the Netherlands, including broilers, broiler breeders, layers in floor and in aviary system, turkeys, piglets, growing-finishing pigs in traditional and low emission housing with dry and liquid feed, and sows in individual and group housing. A total of 119 pairs of 24 h measurements (55 for poultry and 64 for pigs) were recorded and analyzed using linear regression analysis. Deviations between samplers were calculated and discussed. In poultry, cyclone sampler and DustTrak data fitted well to a linear regression, with a regression coefficient equal to 0.41, an intercept of 0.16 mg m-3 and a correlation coefficient of 0.91 (excluding turkeys). Results in turkeys showed a regression coefficient equal to 1.1 (P = 0.49), an intercept of 0.06 mg m-3 (P < 0.0001) and a correlation coefficient of 0.98. In pigs, we found a regression coefficient equal to 0.61, an intercept of 0.05 mg m-3 and a correlation coefficient of 0.84. Measured PM10 concentrations using DustTraks were clearly underestimated (approx. by a factor 2) in both poultry and pig housing systems compared with cyclone pre-separators. Absolute, relative, and random deviations increased with concentration. DustTrak light scattering devices should be self-calibrated to investigate PM10 mass concentrations accurately in animal houses. We recommend linear regression equations as animal-specific calibration factors for DustTraks instead of manufacturer calibration factors, especially in heavily dusty environments such as animal houses.

  9. Dynamic linear models using the Kalman filter for early detection and early warning of malaria outbreaks

    NASA Astrophysics Data System (ADS)

    Merkord, C. L.; Liu, Y.; DeVos, M.; Wimberly, M. C.

    2015-12-01

    Malaria early detection and early warning systems are important tools for public health decision makers in regions where malaria transmission is seasonal and varies from year to year with fluctuations in rainfall and temperature. Here we present a new data-driven dynamic linear model based on the Kalman filter with time-varying coefficients that are used to identify malaria outbreaks as they occur (early detection) and predict the location and timing of future outbreaks (early warning). We fit linear models of malaria incidence with trend and Fourier form seasonal components using three years of weekly malaria case data from 30 districts in the Amhara Region of Ethiopia. We identified past outbreaks by comparing the modeled prediction envelopes with observed case data. Preliminary results demonstrated the potential for improved accuracy and timeliness over commonly-used methods in which thresholds are based on simpler summary statistics of historical data. Other benefits of the dynamic linear modeling approach include robustness to missing data and the ability to fit models with relatively few years of training data. To predict future outbreaks, we started with the early detection model for each district and added a regression component based on satellite-derived environmental predictor variables including precipitation data from the Tropical Rainfall Measuring Mission (TRMM) and land surface temperature (LST) and spectral indices from the Moderate Resolution Imaging Spectroradiometer (MODIS). We included lagged environmental predictors in the regression component of the model, with lags chosen based on cross-correlation of the one-step-ahead forecast errors from the first model. Our results suggest that predictions of future malaria outbreaks can be improved by incorporating lagged environmental predictors.

  10. United States Medical Licensing Examination and American Board of Pediatrics Certification Examination Results: Does the Residency Program Contribute to Trainee Achievement.

    PubMed

    Welch, Thomas R; Olson, Brad G; Nelsen, Elizabeth; Beck Dallaghan, Gary L; Kennedy, Gloria A; Botash, Ann

    2017-09-01

    To determine whether training site or prior examinee performance on the US Medical Licensing Examination (USMLE) step 1 and step 2 might predict pass rates on the American Board of Pediatrics (ABP) certifying examination. Data from graduates of pediatric residency programs completing the ABP certifying examination between 2009 and 2013 were obtained. For each, results of the initial ABP certifying examination were obtained, as well as results on National Board of Medical Examiners (NBME) step 1 and step 2 examinations. Hierarchical linear modeling was used to nest first-time ABP results within training programs to isolate program contribution to ABP results while controlling for USMLE step 1 and step 2 scores. Stepwise linear regression was then used to determine which of these examinations was a better predictor of ABP results. A total of 1110 graduates of 15 programs had complete testing results and were subject to analysis. Mean ABP scores for these programs ranged from 186.13 to 214.32. The hierarchical linear model suggested that the interaction of step 1 and 2 scores predicted ABP performance (F[1,1007.70] = 6.44, P = .011). By conducting a multilevel model by training program, both USMLE step examinations predicted first-time ABP results (b = .002, t = 2.54, P = .011). Linear regression analyses indicated that step 2 results were a better predictor of ABP performance than step 1 or a combination of the two USMLE scores. Performance on the USMLE examinations, especially step 2, predicts performance on the ABP certifying examination. The contribution of training site to ABP performance was statistically significant, though contributed modestly to the effect compared with prior USMLE scores. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Burnout does not help predict depression among French school teachers.

    PubMed

    Bianchi, Renzo; Schonfeld, Irvin Sam; Laurent, Eric

    2015-11-01

    Burnout has been viewed as a phase in the development of depression. However, supportive research is scarce. We examined whether burnout predicted depression among French school teachers. We conducted a 2-wave, 21-month study involving 627 teachers (73% female) working in French primary and secondary schools. Burnout was assessed with the Maslach Burnout Inventory and depression with the 9-item depression module of the Patient Health Questionnaire (PHQ-9). The PHQ-9 grades depressive symptom severity and provides a provisional diagnosis of major depression. Depression was treated both as a continuous and categorical variable using linear and logistic regression analyses. We controlled for gender, age, and length of employment. Controlling for baseline depressive symptoms, linear regression analysis showed that burnout symptoms at time 1 (T1) did not predict depressive symptoms at time 2 (T2). Baseline depressive symptoms accounted for about 88% of the association between T1 burnout and T2 depressive symptoms. Only baseline depressive symptoms predicted depressive symptoms at follow-up. Similarly, logistic regression analysis revealed that burnout symptoms at T1 did not predict incident cases of major depression at T2 when depressive symptoms at T1 were included in the predictive model. Only baseline depressive symptoms predicted cases of major depression at follow-up. This study does not support the view that burnout is a phase in the development of depression. Assessing burnout symptoms in addition to "classical" depressive symptoms may not always improve our ability to predict future depression.

  12. Simulation of groundwater level variations using wavelet combined with neural network, linear regression and support vector machine

    NASA Astrophysics Data System (ADS)

    Ebrahimi, Hadi; Rajaee, Taher

    2017-01-01

    Simulation of groundwater level (GWL) fluctuations is an important task in management of groundwater resources. In this study, the effect of wavelet analysis on the training of the artificial neural network (ANN), multi linear regression (MLR) and support vector regression (SVR) approaches was investigated, and the ANN, MLR and SVR along with the wavelet-ANN (WNN), wavelet-MLR (WLR) and wavelet-SVR (WSVR) models were compared in simulating one-month-ahead of GWL. The only variable used to develop the models was the monthly GWL data recorded over a period of 11 years from two wells in the Qom plain, Iran. The results showed that decomposing GWL time series into several sub-time series, extremely improved the training of the models. For both wells 1 and 2, the Meyer and Db5 wavelets produced better results compared to the other wavelets; which indicated wavelet types had similar behavior in similar case studies. The optimal number of delays was 6 months, which seems to be due to natural phenomena. The best WNN model, using Meyer mother wavelet with two decomposition levels, simulated one-month-ahead with RMSE values being equal to 0.069 m and 0.154 m for wells 1 and 2, respectively. The RMSE values for the WLR model were 0.058 m and 0.111 m, and for WSVR model were 0.136 m and 0.060 m for wells 1 and 2, respectively.

  13. Analysis of regression methods for solar activity forecasting

    NASA Technical Reports Server (NTRS)

    Lundquist, C. A.; Vaughan, W. W.

    1979-01-01

    The paper deals with the potential use of the most recent solar data to project trends in the next few years. Assuming that a mode of solar influence on weather can be identified, advantageous use of that knowledge presumably depends on estimating future solar activity. A frequently used technique for solar cycle predictions is a linear regression procedure along the lines formulated by McNish and Lincoln (1949). The paper presents a sensitivity analysis of the behavior of such regression methods relative to the following aspects: cycle minimum, time into cycle, composition of historical data base, and unnormalized vs. normalized solar cycle data. Comparative solar cycle forecasts for several past cycles are presented as to these aspects of the input data. Implications for the current cycle, No. 21, are also given.

  14. Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards.

    PubMed

    Churpek, Matthew M; Yuen, Trevor C; Winslow, Christopher; Meltzer, David O; Kattan, Michael W; Edelson, Dana P

    2016-02-01

    Machine learning methods are flexible prediction algorithms that may be more accurate than conventional regression. We compared the accuracy of different techniques for detecting clinical deterioration on the wards in a large, multicenter database. Observational cohort study. Five hospitals, from November 2008 until January 2013. Hospitalized ward patients None Demographic variables, laboratory values, and vital signs were utilized in a discrete-time survival analysis framework to predict the combined outcome of cardiac arrest, intensive care unit transfer, or death. Two logistic regression models (one using linear predictor terms and a second utilizing restricted cubic splines) were compared to several different machine learning methods. The models were derived in the first 60% of the data by date and then validated in the next 40%. For model derivation, each event time window was matched to a non-event window. All models were compared to each other and to the Modified Early Warning score, a commonly cited early warning score, using the area under the receiver operating characteristic curve (AUC). A total of 269,999 patients were admitted, and 424 cardiac arrests, 13,188 intensive care unit transfers, and 2,840 deaths occurred in the study. In the validation dataset, the random forest model was the most accurate model (AUC, 0.80 [95% CI, 0.80-0.80]). The logistic regression model with spline predictors was more accurate than the model utilizing linear predictors (AUC, 0.77 vs 0.74; p < 0.01), and all models were more accurate than the MEWS (AUC, 0.70 [95% CI, 0.70-0.70]). In this multicenter study, we found that several machine learning methods more accurately predicted clinical deterioration than logistic regression. Use of detection algorithms derived from these techniques may result in improved identification of critically ill patients on the wards.

  15. Structure-function relationships using spectral-domain optical coherence tomography: comparison with scanning laser polarimetry.

    PubMed

    Aptel, Florent; Sayous, Romain; Fortoul, Vincent; Beccat, Sylvain; Denis, Philippe

    2010-12-01

    To evaluate and compare the regional relationships between visual field sensitivity and retinal nerve fiber layer (RNFL) thickness as measured by spectral-domain optical coherence tomography (OCT) and scanning laser polarimetry. Prospective cross-sectional study. One hundred and twenty eyes of 120 patients (40 with healthy eyes, 40 with suspected glaucoma, and 40 with glaucoma) were tested on Cirrus-OCT, GDx VCC, and standard automated perimetry. Raw data on RNFL thickness were extracted for 256 peripapillary sectors of 1.40625 degrees each for the OCT measurement ellipse and 64 peripapillary sectors of 5.625 degrees each for the GDx VCC measurement ellipse. Correlations between peripapillary RNFL thickness in 6 sectors and visual field sensitivity in the 6 corresponding areas were evaluated using linear and logarithmic regression analysis. Receiver operating curve areas were calculated for each instrument. With spectral-domain OCT, the correlations (r(2)) between RNFL thickness and visual field sensitivity ranged from 0.082 (nasal RNFL and corresponding visual field area, linear regression) to 0.726 (supratemporal RNFL and corresponding visual field area, logarithmic regression). By comparison, with GDx-VCC, the correlations ranged from 0.062 (temporal RNFL and corresponding visual field area, linear regression) to 0.362 (supratemporal RNFL and corresponding visual field area, logarithmic regression). In pairwise comparisons, these structure-function correlations were generally stronger with spectral-domain OCT than with GDx VCC and with logarithmic regression than with linear regression. The largest areas under the receiver operating curve were seen for OCT superior thickness (0.963 ± 0.022; P < .001) in eyes with glaucoma and for OCT average thickness (0.888 ± 0.072; P < .001) in eyes with suspected glaucoma. The structure-function relationship was significantly stronger with spectral-domain OCT than with scanning laser polarimetry, and was better expressed logarithmically than linearly. Measurements with these 2 instruments should not be considered to be interchangeable. Copyright © 2010 Elsevier Inc. All rights reserved.

  16. Multiple long-term trends and trend reversals dominate environmental conditions in a man-made freshwater reservoir.

    PubMed

    Znachor, Petr; Nedoma, Jiří; Hejzlar, Josef; Seďa, Jaromír; Kopáček, Jiří; Boukal, David; Mrkvička, Tomáš

    2018-05-15

    Man-made reservoirs are common across the world and provide a wide range of ecological services. Environmental conditions in riverine reservoirs are affected by the changing climate, catchment-wide processes and manipulations with the water level, and water abstraction from the reservoir. Long-term trends of environmental conditions in reservoirs thus reflect a wider range of drivers in comparison to lakes, which makes the understanding of reservoir dynamics more challenging. We analysed a 32-year time series of 36 environmental variables characterising weather, land use in the catchment, reservoir hydrochemistry, hydrology and light availability in the small, canyon-shaped Římov Reservoir in the Czech Republic to detect underlying trends, trend reversals and regime shifts. To do so, we fitted linear and piecewise linear regression and a regime shift model to the time series of mean annual values of each variable and to principal components produced by Principal Component Analysis. Models were weighted and ranked using Akaike information criterion and the model selection approach. Most environmental variables exhibited temporal changes that included time-varying trends and trend reversals. For instance, dissolved organic carbon showed a linear increasing trend while nitrate concentration or conductivity exemplified trend reversal. All trend reversals and cessations of temporal trends in reservoir hydrochemistry (except total phosphorus concentrations) occurred in the late 1980s and during 1990s as a consequence of dramatic socioeconomic changes. After a series of heavy rains in the late 1990s, an administrative decision to increase the flood-retention volume of the reservoir resulted in a significant regime shift in reservoir hydraulic conditions in 1999. Our analyses also highlight the utility of the model selection framework, based on relatively simple extensions of linear regression, to describe temporal trends in reservoir characteristics. This approach can provide a solid basis for a better understanding of processes in freshwater reservoirs. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. A Simulation-Based Comparison of Several Stochastic Linear Regression Methods in the Presence of Outliers.

    ERIC Educational Resources Information Center

    Rule, David L.

    Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…

  18. Prediction of elemental creep. [steady state and cyclic data from regression analysis

    NASA Technical Reports Server (NTRS)

    Davis, J. W.; Rummler, D. R.

    1975-01-01

    Cyclic and steady-state creep tests were performed to provide data which were used to develop predictive equations. These equations, describing creep as a function of stress, temperature, and time, were developed through the use of a least squares regression analyses computer program for both the steady-state and cyclic data sets. Comparison of the data from the two types of tests, revealed that there was no significant difference between the cyclic and steady-state creep strains for the L-605 sheet under the experimental conditions investigated (for the same total time at load). Attempts to develop a single linear equation describing the combined steady-state and cyclic creep data resulted in standard errors of estimates higher than obtained for the individual data sets. A proposed approach to predict elemental creep in metals uses the cyclic creep equation and a computer program which applies strain and time hardening theories of creep accumulation.

  19. Unit Cohesion and the Surface Navy: Does Cohesion Affect Performance

    DTIC Science & Technology

    1989-12-01

    v. 68, 1968. Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. Rand Corporation R-2607...Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. SAS User’s Guide: Basics, Version 5 ed

  20. Comparison of Selection Procedures and Validation of Criterion Used in Selection of Significant Control Variates of a Simulation Model

    DTIC Science & Technology

    1990-03-01

    and M.H. Knuter. Applied Linear Regression Models. Homewood IL: Richard D. Erwin Inc., 1983. Pritsker, A. Alan B. Introduction to Simulation and SLAM...Control Variates in Simulation," European Journal of Operational Research, 42: (1989). Neter, J., W. Wasserman, and M.H. Xnuter. Applied Linear Regression Models

  1. Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

    ERIC Educational Resources Information Center

    Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

    2013-01-01

    Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…

  2. Calibrated Peer Review for Interpreting Linear Regression Parameters: Results from a Graduate Course

    ERIC Educational Resources Information Center

    Enders, Felicity B.; Jenkins, Sarah; Hoverman, Verna

    2010-01-01

    Biostatistics is traditionally a difficult subject for students to learn. While the mathematical aspects are challenging, it can also be demanding for students to learn the exact language to use to correctly interpret statistical results. In particular, correctly interpreting the parameters from linear regression is both a vital tool and a…

  3. Some Applied Research Concerns Using Multiple Linear Regression Analysis.

    ERIC Educational Resources Information Center

    Newman, Isadore; Fraas, John W.

    The intention of this paper is to provide an overall reference on how a researcher can apply multiple linear regression in order to utilize the advantages that it has to offer. The advantages and some concerns expressed about the technique are examined. A number of practical ways by which researchers can deal with such concerns as…

  4. Using Simple Linear Regression to Assess the Success of the Montreal Protocol in Reducing Atmospheric Chlorofluorocarbons

    ERIC Educational Resources Information Center

    Nelson, Dean

    2009-01-01

    Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…

  5. Quantum State Tomography via Linear Regression Estimation

    PubMed Central

    Qi, Bo; Hou, Zhibo; Li, Li; Dong, Daoyi; Xiang, Guoyong; Guo, Guangcan

    2013-01-01

    A simple yet efficient state reconstruction algorithm of linear regression estimation (LRE) is presented for quantum state tomography. In this method, quantum state reconstruction is converted into a parameter estimation problem of a linear regression model and the least-squares method is employed to estimate the unknown parameters. An asymptotic mean squared error (MSE) upper bound for all possible states to be estimated is given analytically, which depends explicitly upon the involved measurement bases. This analytical MSE upper bound can guide one to choose optimal measurement sets. The computational complexity of LRE is O(d4) where d is the dimension of the quantum state. Numerical examples show that LRE is much faster than maximum-likelihood estimation for quantum state tomography. PMID:24336519

  6. Applications of statistics to medical science, III. Correlation and regression.

    PubMed

    Watanabe, Hiroshi

    2012-01-01

    In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.

  7. A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.

    PubMed

    Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S

    2017-06-01

    The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.

  8. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    NASA Astrophysics Data System (ADS)

    Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.

    2016-01-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.

  9. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    NASA Astrophysics Data System (ADS)

    dos Santos, T. S.; Mendes, D.; Torres, R. R.

    2015-08-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANN) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon, Northeastern Brazil and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model out- put and observed monthly precipitation. We used GCMs experiments for the 20th century (RCP Historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANN significantly outperforms the MLR downscaling of monthly precipitation variability.

  10. Noninvasive and fast measurement of blood glucose in vivo by near infrared (NIR) spectroscopy

    NASA Astrophysics Data System (ADS)

    Jintao, Xue; Liming, Ye; Yufei, Liu; Chunyan, Li; Han, Chen

    2017-05-01

    This research was to develop a method for noninvasive and fast blood glucose assay in vivo. Near-infrared (NIR) spectroscopy, a more promising technique compared to other methods, was investigated in rats with diabetes and normal rats. Calibration models are generated by two different multivariate strategies: partial least squares (PLS) as linear regression method and artificial neural networks (ANN) as non-linear regression method. The PLS model was optimized individually by considering spectral range, spectral pretreatment methods and number of model factors, while the ANN model was studied individually by selecting spectral pretreatment methods, parameters of network topology, number of hidden neurons, and times of epoch. The results of the validation showed the two models were robust, accurate and repeatable. Compared to the ANN model, the performance of the PLS model was much better, with lower root mean square error of validation (RMSEP) of 0.419 and higher correlation coefficients (R) of 96.22%.

  11. Commensurate Priors for Incorporating Historical Information in Clinical Trials Using General and Generalized Linear Models

    PubMed Central

    Hobbs, Brian P.; Sargent, Daniel J.; Carlin, Bradley P.

    2014-01-01

    Assessing between-study variability in the context of conventional random-effects meta-analysis is notoriously difficult when incorporating data from only a small number of historical studies. In order to borrow strength, historical and current data are often assumed to be fully homogeneous, but this can have drastic consequences for power and Type I error if the historical information is biased. In this paper, we propose empirical and fully Bayesian modifications of the commensurate prior model (Hobbs et al., 2011) extending Pocock (1976), and evaluate their frequentist and Bayesian properties for incorporating patient-level historical data using general and generalized linear mixed regression models. Our proposed commensurate prior models lead to preposterior admissible estimators that facilitate alternative bias-variance trade-offs than those offered by pre-existing methodologies for incorporating historical data from a small number of historical studies. We also provide a sample analysis of a colon cancer trial comparing time-to-disease progression using a Weibull regression model. PMID:24795786

  12. Relationships between age and dental attrition in Australian aboriginals.

    PubMed

    Richards, L C; Miller, S L

    1991-02-01

    Tooth wear scores (ratios of exposed dentin to total crown area) were calculated from dental casts of Australian Aboriginal subjects of known age from three populations. Linear regression equations relating attrition scores to age were derived. The slope of the regression line reflects the rate of tooth wear, and the intercept is related to the timing of first exposure of dentin. Differences in morphology between anterior and posterior teeth are reflected in a linear relationship between attrition scores and age for anterior teeth but a logarithmic relationship for posterior teeth. Correlations between age and attrition range from less than 0.40 for third molars (where differences in the eruption and occlusion of the teeth resulted in different patterns of wear) to greater than 0.80 for the premolars and first molars. Because of the generally high correlations between age and attrition, it is possible to estimate age from the extent of tooth wear with confidence limits of the order of +/- 10 years.

  13. Regression of non-linear coupling of noise in LIGO detectors

    NASA Astrophysics Data System (ADS)

    Da Silva Costa, C. F.; Billman, C.; Effler, A.; Klimenko, S.; Cheng, H.-P.

    2018-03-01

    In 2015, after their upgrade, the advanced Laser Interferometer Gravitational-Wave Observatory (LIGO) detectors started acquiring data. The effort to improve their sensitivity has never stopped since then. The goal to achieve design sensitivity is challenging. Environmental and instrumental noise couple to the detector output with different, linear and non-linear, coupling mechanisms. The noise regression method we use is based on the Wiener–Kolmogorov filter, which uses witness channels to make noise predictions. We present here how this method helped to determine complex non-linear noise couplings in the output mode cleaner and in the mirror suspension system of the LIGO detector.

  14. No association of smoke-free ordinances with profits from bingo and charitable games in Massachusetts.

    PubMed

    Glantz, S A; Wilson-Loots, R

    2003-12-01

    Because it is widely played, claims that smoking restrictions will adversely affect bingo games is used as an argument against these policies. We used publicly available data from Massachusetts to assess the impact of 100% smoke-free ordinances on profits from bingo and other gambling sponsored by charitable organisations between 1985 and 2001. We conducted two analyses: (1) a general linear model implementation of a time series analysis with net profits (adjusted to 2001 dollars) as the dependent variable, and community (as a fixed effect), year, lagged net profits, and the length of time the ordinance had been in force as the independent variables; (2) multiple linear regression of total state profits against time, lagged profits, and the percentage of the entire state population in communities that allow charitable gaming but prohibit smoking. The general linear model analysis of data from individual communities showed that, while adjusted profits fell over time, this effect was not related to the presence of an ordinance. The analysis in terms of the fraction of the population living in communities with ordinances yielded the same result. Policymakers can implement smoke-free policies without concern that these policies will affect charitable gaming.

  15. DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

    NASA Astrophysics Data System (ADS)

    Mokhtari, Aryan; Shi, Wei; Ling, Qing; Ribeiro, Alejandro

    2016-10-01

    This paper considers decentralized consensus optimization problems where nodes of a network have access to different summands of a global objective function. Nodes cooperate to minimize the global objective by exchanging information with neighbors only. A decentralized version of the alternating directions method of multipliers (DADMM) is a common method for solving this category of problems. DADMM exhibits linear convergence rate to the optimal objective but its implementation requires solving a convex optimization problem at each iteration. This can be computationally costly and may result in large overall convergence times. The decentralized quadratically approximated ADMM algorithm (DQM), which minimizes a quadratic approximation of the objective function that DADMM minimizes at each iteration, is proposed here. The consequent reduction in computational time is shown to have minimal effect on convergence properties. Convergence still proceeds at a linear rate with a guaranteed constant that is asymptotically equivalent to the DADMM linear convergence rate constant. Numerical results demonstrate advantages of DQM relative to DADMM and other alternatives in a logistic regression problem.

  16. Wavelet analysis for the study of the relations among soil radon anomalies, volcanic and seismic events: the case of Mt. Etna (Italy)

    NASA Astrophysics Data System (ADS)

    Ferrera, Elisabetta; Giammanco, Salvatore; Cannata, Andrea; Montalto, Placido

    2013-04-01

    From November 2009 to April 2011 soil radon activity was continuously monitored using a Barasol® probe located on the upper NE flank of Mt. Etna volcano, close either to the Piano Provenzana fault or to the NE-Rift. Seismic and volcanological data have been analyzed together with radon data. We also analyzed air and soil temperature, barometric pressure, snow and rain fall data. In order to find possible correlations among the above parameters, and hence to reveal possible anomalies in the radon time-series, we used different statistical methods: i) multivariate linear regression; ii) cross-correlation; iii) coherence analysis through wavelet transform. Multivariate regression indicated a modest influence on soil radon from environmental parameters (R2 = 0.31). When using 100-days time windows, the R2 values showed wide variations in time, reaching their maxima (~0.63-0.66) during summer. Cross-correlation analysis over 100-days moving averages showed that, similar to multivariate linear regression analysis, the summer period is characterised by the best correlation between radon data and environmental parameters. Lastly, the wavelet coherence analysis allowed a multi-resolution coherence analysis of the time series acquired. This approach allows to study the relations among different signals either in time or frequency domain. It confirmed the results of the previous methods, but also allowed to recognize correlations between radon and environmental parameters at different observation scales (e.g., radon activity changed during strong precipitations, but also during anomalous variations of soil temperature uncorrelated with seasonal fluctuations). Our work suggests that in order to make an accurate analysis of the relations among distinct signals it is necessary to use different techniques that give complementary analytical information. In particular, the wavelet analysis showed to be very effective in discriminating radon changes due to environmental influences from those correlated with impending seismic or volcanic events.

  17. Instantaneous global spatial interaction? Exploring the Gaussian inequality, distance and Internet pings in a global network

    NASA Astrophysics Data System (ADS)

    Baker, R. G. V.

    2005-12-01

    The Internet has been publicly portrayed as a new technological horizon yielding instantaneous interaction to a point where geography no longer matters. This research aims to dispel this impression by applying a dynamic form of trip modelling to investigate pings in a global computer network compiled by the Stanford Linear Accelerator Centre (SLAC) from 1998 to 2004. Internet flows have been predicted to have the same mathematical operators as trips to a supermarket, since they are both periodic and constrained by a distance metric. Both actual and virtual trips are part of a spectrum of origin-destination pairs in the time-space convergence of trip time-lines. Internet interaction is very near to the convergence of these time-lines (at a very small time scale in milliseconds, but with interactions over thousands of kilometres). There is a lag effect and this is formalised by the derivation of Gaussian and gravity inequalities between the time taken (Δ t) and the partitioning of distance (Δ x). This inequality seems to be robust for a regression of Δ t to Δ x in the SLAC data set for each year (1998 to 2004). There is a constant ‘forbidden zone’ in the interaction, underpinned by the fact that pings do not travel faster than the speed of light. Superimposed upon this zone is the network capacity where a linear regression of Δ t to Δ x is a proxy summarising global Internet connectivity for that year. The results suggest that there has been a substantial improvement in connectivity over the period with R 2 increasing steadily from 0.39 to 0.65 from less Gaussian spreading of the ping latencies. Further, the regression line shifts towards the inequality boundary from 1998 to 2004, where the increased slope shows a greater proportional rise in local connectivity over global connectivity. A conclusion is that national geography still does matter in spatial interaction modelling of the Internet.

  18. QSRR modeling for diverse drugs using different feature selection methods coupled with linear and nonlinear regressions.

    PubMed

    Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan

    2012-12-01

    A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.

  19. The PX-EM algorithm for fast stable fitting of Henderson's mixed model

    PubMed Central

    Foulley, Jean-Louis; Van Dyk, David A

    2000-01-01

    This paper presents procedures for implementing the PX-EM algorithm of Liu, Rubin and Wu to compute REML estimates of variance covariance components in Henderson's linear mixed models. The class of models considered encompasses several correlated random factors having the same vector length e.g., as in random regression models for longitudinal data analysis and in sire-maternal grandsire models for genetic evaluation. Numerical examples are presented to illustrate the procedures. Much better results in terms of convergence characteristics (number of iterations and time required for convergence) are obtained for PX-EM relative to the basic EM algorithm in the random regression. PMID:14736399

  20. Complex messages regarding a thin ideal appearing in teenage girls' magazines from 1956 to 2005.

    PubMed

    Luff, Gina M; Gray, James J

    2009-03-01

    Seventeen and YM were assessed from 1956 through 2005 (n=312) to examine changes in the messages about thinness sent to teenage women. Trends were analyzed through an investigation of written, internal content focused on dieting, exercise, or both, while cover models were examined to explore fluctuations in body size. Pearson's Product correlations and weighted-least squares linear regression models were used to demonstrate changes over time. The frequency of written content related to exercise and combined plans increased in Seventeen, while a curvilinear relationship between time and content relating to dieting appeared. YM showed a linear increase in content related to dieting, exercise, and combined plans. Average cover model body size increased over time in YM while demonstrating no significant changes in Seventeen. Overall, more written messages about dieting and exercise appeared in teen's magazines in 2005 than before while the average cover model body size increased.

  1. Multiscale characterization and prediction of monsoon rainfall in India using Hilbert-Huang transform and time-dependent intrinsic correlation analysis

    NASA Astrophysics Data System (ADS)

    Adarsh, S.; Reddy, M. Janga

    2017-07-01

    In this paper, the Hilbert-Huang transform (HHT) approach is used for the multiscale characterization of All India Summer Monsoon Rainfall (AISMR) time series and monsoon rainfall time series from five homogeneous regions in India. The study employs the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) for multiscale decomposition of monsoon rainfall in India and uses the Normalized Hilbert Transform and Direct Quadrature (NHT-DQ) scheme for the time-frequency characterization. The cross-correlation analysis between orthogonal modes of All India monthly monsoon rainfall time series and that of five climate indices such as Quasi Biennial Oscillation (QBO), El Niño Southern Oscillation (ENSO), Sunspot Number (SN), Atlantic Multi Decadal Oscillation (AMO), and Equatorial Indian Ocean Oscillation (EQUINOO) in the time domain showed that the links of different climate indices with monsoon rainfall are expressed well only for few low-frequency modes and for the trend component. Furthermore, this paper investigated the hydro-climatic teleconnection of ISMR in multiple time scales using the HHT-based running correlation analysis technique called time-dependent intrinsic correlation (TDIC). The results showed that both the strength and nature of association between different climate indices and ISMR vary with time scale. Stemming from this finding, a methodology employing Multivariate extension of EMD and Stepwise Linear Regression (MEMD-SLR) is proposed for prediction of monsoon rainfall in India. The proposed MEMD-SLR method clearly exhibited superior performance over the IMD operational forecast, M5 Model Tree (MT), and multiple linear regression methods in ISMR predictions and displayed excellent predictive skill during 1989-2012 including the four extreme events that have occurred during this period.

  2. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    ERIC Educational Resources Information Center

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  3. SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES

    PubMed Central

    Zhu, Liping; Huang, Mian; Li, Runze

    2012-01-01

    This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536

  4. [Association between hours of television watched, physical activity, sleep and excess weight among young adults].

    PubMed

    Martínez-Moyá, María; Navarrete-Muñoz, Eva M; García de la Hera, Manuela; Giménez-Monzo, Daniel; González-Palacios, Sandra; Valera-Gran, Desirée; Sempere-Orts, María; Vioque, Jesús

    2014-01-01

    To explore the association between excess weight or body mass index (BMI) and the time spent watching television, self-reported physical activity and sleep duration in a young adult population. We analyzed cross-sectional baseline data of 1,135 participants (17-35 years old) from the project Dieta, salud y antropometría en población universitaria (Diet, Health and Anthrompmetric Variables in Univeristy Students). Information about time spent watching television, sleep duration, self-reported physical activity and self-reported height and weight was provided by a baseline questionnaire. BMI was calculated as kg/m(2) and excess of weight was defined as ≥25. We used multiple logistic regression to explore the association between excess weight (no/yes) and independent variables, and multiple linear regression for BMI. The prevalence of excess weight was 13.7% (11.2% were overweight and 2.5% were obese). A significant positive association was found between excess weight and a greater amount of time spent watching television. Participants who reported watching television >2h a day had a higher risk of excess weight than those who watched television ≤1h a day (OR=2.13; 95%CI: 1.37-3.36; p-trend: 0.002). A lower level of physical activity was associated with an increased risk of excess weight, although the association was statistically significant only in multiple linear regression (p=0.037). No association was observed with sleep duration. A greater number of hours spent watching television and lower physical activity were significantly associated with a higher BMI in young adults. Both factors are potentially modifiable with preventive strategies. Copyright © 2013 SESPAS. Published by Elsevier Espana. All rights reserved.

  5. Combining fixed effects and instrumental variable approaches for estimating the effect of psychosocial job quality on mental health: evidence from 13 waves of a nationally representative cohort study.

    PubMed

    Milner, Allison; Aitken, Zoe; Kavanagh, Anne; LaMontagne, Anthony D; Pega, Frank; Petrie, Dennis

    2017-06-23

    Previous studies suggest that poor psychosocial job quality is a risk factor for mental health problems, but they use conventional regression analytic methods that cannot rule out reverse causation, unmeasured time-invariant confounding and reporting bias. This study combines two quasi-experimental approaches to improve causal inference by better accounting for these biases: (i) linear fixed effects regression analysis and (ii) linear instrumental variable analysis. We extract 13 annual waves of national cohort data including 13 260 working-age (18-64 years) employees. The exposure variable is self-reported level of psychosocial job quality. The instruments used are two common workplace entitlements. The outcome variable is the Mental Health Inventory (MHI-5). We adjust for measured time-varying confounders. In the fixed effects regression analysis adjusted for time-varying confounders, a 1-point increase in psychosocial job quality is associated with a 1.28-point improvement in mental health on the MHI-5 scale (95% CI: 1.17, 1.40; P < 0.001). When the fixed effects was combined with the instrumental variable analysis, a 1-point increase psychosocial job quality is related to 1.62-point improvement on the MHI-5 scale (95% CI: -0.24, 3.48; P = 0.088). Our quasi-experimental results provide evidence to confirm job stressors as risk factors for mental ill health using methods that improve causal inference. © The Author 2017. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  6. Modeling of time trends and interactions in vital rates using restricted regression splines.

    PubMed

    Heuer, C

    1997-03-01

    For the analysis of time trends in incidence and mortality rates, the age-period-cohort (apc) model has became a widely accepted method. The considered data are arranged in a two-way table by age group and calendar period, which are mostly subdivided into 5- or 10-year intervals. The disadvantage of this approach is the loss of information by data aggregation and the problems of estimating interactions in the two-way layout without replications. In this article we show how splines can be useful when yearly data, i.e., 1-year age groups and 1-year periods, are given. The estimated spline curves are still smooth and represent yearly changes in the time trends. Further, it is straightforward to include interaction terms by the tensor product of the spline functions. If the data are given in a nonrectangular table, e.g., 5-year age groups and 1-year periods, the period and cohort variables can be parameterized by splines, while the age variable is parameterized as fixed effect levels, which leads to a semiparametric apc model. An important methodological issue in developing the nonparametric and semiparametric models is stability of the estimated spline curve at the boundaries. Here cubic regression splines will be used, which are constrained to be linear in the tails. Another point of importance is the nonidentifiability problem due to the linear dependency of the three time variables. This will be handled by decomposing the basis of each spline by orthogonal projection into constant, linear, and nonlinear terms, as suggested by Holford (1983, Biometrics 39, 311-324) for the traditional apc model. The advantage of using splines for yearly data compared to the traditional approach for aggregated data is the more accurate curve estimation for the nonlinear trend changes and the simple way of modeling interactions between the time variables. The method will be demonstrated with hypothetical data as well as with cancer mortality data.

  7. Factors that influence standard automated perimetry test results in glaucoma: test reliability, technician experience, time of day, and season.

    PubMed

    Junoy Montolio, Francisco G; Wesselink, Christiaan; Gordijn, Marijke; Jansonius, Nomdo M

    2012-10-09

    To determine the influence of several factors on standard automated perimetry test results in glaucoma. Longitudinal Humphrey field analyzer 30-2 Swedish interactive threshold algorithm data from 160 eyes of 160 glaucoma patients were used. The influence of technician experience, time of day, and season on the mean deviation (MD) was determined by performing linear regression analysis of MD against time on a series of visual fields and subsequently performing a multiple linear regression analysis with the MD residuals as dependent variable and the factors mentioned above as independent variables. Analyses were performed with and without adjustment for the test reliability (fixation losses and false-positive and false-negative answers) and with and without stratification according to disease stage (baseline MD). Mean follow-up was 9.4 years, with on average 10.8 tests per patient. Technician experience, time of day, and season were associated with the MD. Approximately 0.2 dB lower MD values were found for inexperienced technicians (P < 0.001), tests performed after lunch (P < 0.001), and tests performed in the summer or autumn (P < 0.001). The effects of time of day and season appeared to depend on disease stage. Independent of these effects, the percentage of false-positive answers strongly influenced the MD with a 1 dB increase in MD per 10% increase in false-positive answers. Technician experience, time of day, season, and the percentage of false-positive answers have a significant influence on the MD of standard automated perimetry.

  8. An adaptive two-stage analog/regression model for probabilistic prediction of small-scale precipitation in France

    NASA Astrophysics Data System (ADS)

    Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine

    2018-01-01

    Statistical downscaling models (SDMs) are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.

  9. The impact of perceived intensity and frequency of police work occupational stressors on the cortisol awakening response (CAR): Findings from the BCOPS study.

    PubMed

    Violanti, John M; Fekedulegn, Desta; Andrew, Michael E; Hartley, Tara A; Charles, Luenda E; Miller, Diane B; Burchfiel, Cecil M

    2017-01-01

    Police officers encounter unpredictable, evolving, and escalating stressful demands in their work. Utilizing the Spielberger Police Stress Survey (60-item instrument for assessing specific conditions or events considered to be stressors in police work), the present study examined the association of the top five highly rated and bottom five least rated work stressors among police officers with their awakening cortisol pattern. Participants were police officers enrolled in the Buffalo Cardio-Metabolic Occupational Police Stress (BCOPS) study (n=338). For each group, the total stress index (product of rating and frequency of the stressor) was calculated. Participants collected saliva by means of Salivettes at four time points: on awakening, 15, 30 and 45min after waking to examine the cortisol awakening response (CAR). Saliva samples were analyzed for free cortisol concentrations. A slope reflecting the awakening pattern of cortisol over time was estimated by fitting a linear regression model relating cortisol in log-scale to time of collection. The slope served as the outcome variable. Analysis of covariance, regression, and repeated measures models were used to determine if there was an association of the stress index with the waking cortisol pattern. There was a significant negative linear association between total stress index of the five highest stressful events and slope of the awakening cortisol regression line (trend p-value=0.0024). As the stress index increased, the pattern of the awakening cortisol regression line tended to flatten. Officers with a zero stress index showed a steep and steady increase in cortisol from baseline (which is often observed) while officers with a moderate or high stress index showed a dampened or flatter response over time. Conversely, the total stress index of the five least rated events was not significantly associated with the awakening cortisol pattern. The study suggests that police events or conditions considered highly stressful by the officers may be associated with disturbances of the typical awakening cortisol pattern. The results are consistent with previous research where chronic exposure to stressors is associated with a diminished awakening cortisol response pattern. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. The impact of perceived intensity and frequency of police work occupational stressors on the cortisol awakening response (CAR): Findings from the BCOPS study

    PubMed Central

    Violanti, John M.; Fekedulegn, Desta; Andrew, Michael E.; Hartley, Tara A.; Charles, Luenda E.; Miller, Diane B.; Burchfiel, Cecil M.

    2016-01-01

    Police officers encounter unpredictable, evolving, and escalating stressful demands in their work. Utilizing the Spielberger Police Stress Survey (60-item instrument for assessing specific conditions or events considered to be stressors in police work), the present study examined the association of the top five highly rated and bottom five least rated work stressors among police officers with their awakening cortisol pattern. Participants were police officers enrolled in the Buffalo Cardio-Metabolic Occupational Police Stress (BCOPS) study (n = 338). For each group, the total stress index (product of rating and frequency of the stressor) was calculated. Participants collected saliva by means of Salivettes at four time points: on awakening, 15, 30 and 45 min after waking to examine the cortisol awakening response (CAR). Saliva samples were analyzed for free cortisol concentrations. A slope reflecting the awakening pattern of cortisol over time was estimated by fitting a linear regression model relating cortisol in log-scale to time of collection. The slope served as the outcome variable. Analysis of covariance, regression, and repeated measures models were used to determine if there was an association of the stress index with the waking cortisol pattern. There was a significant negative linear association between total stress index of the five highest stressful events and slope of the awakening cortisol regression line (trend p-value = 0.0024). As the stress index increased, the pattern of the awakening cortisol regression line tended to flatten. Officers with a zero stress index showed a steep and steady increase in cortisol from baseline (which is often observed) while officers with a moderate or high stress index showed a dampened or flatter response over time. Conversely, the total stress index of the five least rated events was not significantly associated with the awakening cortisol pattern. The study suggests that police events or conditions considered highly stressful by the officers may be associated with disturbances of the typical awakening cortisol pattern. The results are consistent with previous research where chronic exposure to stressors is associated with a diminished awakening cortisol response pattern. PMID:27816820

  11. Prediction of siRNA potency using sparse logistic regression.

    PubMed

    Hu, Wei; Hu, John

    2014-06-01

    RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.

  12. Time Series Analysis for Forecasting Hospital Census: Application to the Neonatal Intensive Care Unit

    PubMed Central

    Hoover, Stephen; Jackson, Eric V.; Paul, David; Locke, Robert

    2016-01-01

    Summary Background Accurate prediction of future patient census in hospital units is essential for patient safety, health outcomes, and resource planning. Forecasting census in the Neonatal Intensive Care Unit (NICU) is particularly challenging due to limited ability to control the census and clinical trajectories. The fixed average census approach, using average census from previous year, is a forecasting alternative used in clinical practice, but has limitations due to census variations. Objective Our objectives are to: (i) analyze the daily NICU census at a single health care facility and develop census forecasting models, (ii) explore models with and without patient data characteristics obtained at the time of admission, and (iii) evaluate accuracy of the models compared with the fixed average census approach. Methods We used five years of retrospective daily NICU census data for model development (January 2008 – December 2012, N=1827 observations) and one year of data for validation (January – December 2013, N=365 observations). Best-fitting models of ARIMA and linear regression were applied to various 7-day prediction periods and compared using error statistics. Results The census showed a slightly increasing linear trend. Best fitting models included a non-seasonal model, ARIMA(1,0,0), seasonal ARIMA models, ARIMA(1,0,0)x(1,1,2)7 and ARIMA(2,1,4)x(1,1,2)14, as well as a seasonal linear regression model. Proposed forecasting models resulted on average in 36.49% improvement in forecasting accuracy compared with the fixed average census approach. Conclusions Time series models provide higher prediction accuracy under different census conditions compared with the fixed average census approach. Presented methodology is easily applicable in clinical practice, can be generalized to other care settings, support short- and long-term census forecasting, and inform staff resource planning. PMID:27437040

  13. Time Series Analysis for Forecasting Hospital Census: Application to the Neonatal Intensive Care Unit.

    PubMed

    Capan, Muge; Hoover, Stephen; Jackson, Eric V; Paul, David; Locke, Robert

    2016-01-01

    Accurate prediction of future patient census in hospital units is essential for patient safety, health outcomes, and resource planning. Forecasting census in the Neonatal Intensive Care Unit (NICU) is particularly challenging due to limited ability to control the census and clinical trajectories. The fixed average census approach, using average census from previous year, is a forecasting alternative used in clinical practice, but has limitations due to census variations. Our objectives are to: (i) analyze the daily NICU census at a single health care facility and develop census forecasting models, (ii) explore models with and without patient data characteristics obtained at the time of admission, and (iii) evaluate accuracy of the models compared with the fixed average census approach. We used five years of retrospective daily NICU census data for model development (January 2008 - December 2012, N=1827 observations) and one year of data for validation (January - December 2013, N=365 observations). Best-fitting models of ARIMA and linear regression were applied to various 7-day prediction periods and compared using error statistics. The census showed a slightly increasing linear trend. Best fitting models included a non-seasonal model, ARIMA(1,0,0), seasonal ARIMA models, ARIMA(1,0,0)x(1,1,2)7 and ARIMA(2,1,4)x(1,1,2)14, as well as a seasonal linear regression model. Proposed forecasting models resulted on average in 36.49% improvement in forecasting accuracy compared with the fixed average census approach. Time series models provide higher prediction accuracy under different census conditions compared with the fixed average census approach. Presented methodology is easily applicable in clinical practice, can be generalized to other care settings, support short- and long-term census forecasting, and inform staff resource planning.

  14. Comparative analysis on the probability of being a good payer

    NASA Astrophysics Data System (ADS)

    Mihova, V.; Pavlov, V.

    2017-10-01

    Credit risk assessment is crucial for the bank industry. The current practice uses various approaches for the calculation of credit risk. The core of these approaches is the use of multiple regression models, applied in order to assess the risk associated with the approval of people applying for certain products (loans, credit cards, etc.). Based on data from the past, these models try to predict what will happen in the future. Different data requires different type of models. This work studies the causal link between the conduct of an applicant upon payment of the loan and the data that he completed at the time of application. A database of 100 borrowers from a commercial bank is used for the purposes of the study. The available data includes information from the time of application and credit history while paying off the loan. Customers are divided into two groups, based on the credit history: Good and Bad payers. Linear and logistic regression are applied in parallel to the data in order to estimate the probability of being good for new borrowers. A variable, which contains value of 1 for Good borrowers and value of 0 for Bad candidates, is modeled as a dependent variable. To decide which of the variables listed in the database should be used in the modelling process (as independent variables), a correlation analysis is made. Due to the results of it, several combinations of independent variables are tested as initial models - both with linear and logistic regression. The best linear and logistic models are obtained after initial transformation of the data and following a set of standard and robust statistical criteria. A comparative analysis between the two final models is made and scorecards are obtained from both models to assess new customers at the time of application. A cut-off level of points, bellow which to reject the applications and above it - to accept them, has been suggested for both the models, applying the strategy to keep the same Accept Rate as in the current data.

  15. Predictive and mechanistic multivariate linear regression models for reaction development

    PubMed Central

    Santiago, Celine B.; Guo, Jing-Yao

    2018-01-01

    Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711

  16. Adding a Parameter Increases the Variance of an Estimated Regression Function

    ERIC Educational Resources Information Center

    Withers, Christopher S.; Nadarajah, Saralees

    2011-01-01

    The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…

  17. Using nonlinear quantile regression to estimate the self-thinning boundary curve

    Treesearch

    Quang V. Cao; Thomas J. Dean

    2015-01-01

    The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...

  18. Simultaneous spectrophotometric determination of salbutamol and bromhexine in tablets.

    PubMed

    Habib, I H I; Hassouna, M E M; Zaki, G A

    2005-03-01

    Typical anti-mucolytic drugs called salbutamol hydrochloride and bromhexine sulfate encountered in tablets were determined simultaneously either by using linear regression at zero-crossing wavelengths of the first derivation of UV-spectra or by application of multiple linear partial least squares regression method. The results obtained by the two proposed mathematical methods were compared with those obtained by the HPLC technique.

  19. High-throughput quantitative biochemical characterization of algal biomass by NIR spectroscopy; multiple linear regression and multivariate linear regression analysis.

    PubMed

    Laurens, L M L; Wolfrum, E J

    2013-12-18

    One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.

  20. Modeling the frequency of opposing left-turn conflicts at signalized intersections using generalized linear regression models.

    PubMed

    Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei

    2014-01-01

    The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.

  1. Combined chamber-tower approach: Using eddy covariance measurements to cross-validate carbon fluxes modeled from manual chamber campaigns

    NASA Astrophysics Data System (ADS)

    Brümmer, C.; Moffat, A. M.; Huth, V.; Augustin, J.; Herbst, M.; Kutsch, W. L.

    2016-12-01

    Manual carbon dioxide flux measurements with closed chambers at scheduled campaigns are a versatile method to study management effects at small scales in multiple-plot experiments. The eddy covariance technique has the advantage of quasi-continuous measurements but requires large homogeneous areas of a few hectares. To evaluate the uncertainties associated with interpolating from individual campaigns to the whole vegetation period, we installed both techniques at an agricultural site in Northern Germany. The presented comparison covers two cropping seasons, winter oilseed rape in 2012/13 and winter wheat in 2013/14. Modeling half-hourly carbon fluxes from campaigns is commonly performed based on non-linear regressions for the light response and respiration. The daily averages of net CO2 modeled from chamber data deviated from eddy covariance measurements in the range of ± 5 g C m-2 day-1. To understand the observed differences and to disentangle the effects, we performed four additional setups (expert versus default settings of the non-linear regressions based algorithm, purely empirical modeling with artificial neural networks versus non-linear regressions, cross-validating using eddy covariance measurements as campaign fluxes, weekly versus monthly scheduling of campaigns) to model the half-hourly carbon fluxes for the whole vegetation period. The good agreement of the seasonal course of net CO2 at plot and field scale for our agricultural site demonstrates that both techniques are robust and yield consistent results at seasonal time scale even for a managed ecosystem with high temporal dynamics in the fluxes. This allows combining the respective advantages of factorial experiments at plot scale with dense time series data at field scale. Furthermore, the information from the quasi-continuous eddy covariance measurements can be used to derive vegetation proxies to support the interpolation of carbon fluxes in-between the manual chamber campaigns.

  2. Standards for Standardized Logistic Regression Coefficients

    ERIC Educational Resources Information Center

    Menard, Scott

    2011-01-01

    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  3. Image interpolation via regularized local linear regression.

    PubMed

    Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang

    2011-12-01

    The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE

  4. A modified temporal criterion to meta-optimize the extended Kalman filter for land cover classification of remotely sensed time series

    NASA Astrophysics Data System (ADS)

    Salmon, B. P.; Kleynhans, W.; Olivier, J. C.; van den Bergh, F.; Wessels, K. J.

    2018-05-01

    Humans are transforming land cover at an ever-increasing rate. Accurate geographical maps on land cover, especially rural and urban settlements are essential to planning sustainable development. Time series extracted from MODerate resolution Imaging Spectroradiometer (MODIS) land surface reflectance products have been used to differentiate land cover classes by analyzing the seasonal patterns in reflectance values. The proper fitting of a parametric model to these time series usually requires several adjustments to the regression method. To reduce the workload, a global setting of parameters is done to the regression method for a geographical area. In this work we have modified a meta-optimization approach to setting a regression method to extract the parameters on a per time series basis. The standard deviation of the model parameters and magnitude of residuals are used as scoring function. We successfully fitted a triply modulated model to the seasonal patterns of our study area using a non-linear extended Kalman filter (EKF). The approach uses temporal information which significantly reduces the processing time and storage requirements to process each time series. It also derives reliability metrics for each time series individually. The features extracted using the proposed method are classified with a support vector machine and the performance of the method is compared to the original approach on our ground truth data.

  5. Improved Correction of Atmospheric Pressure Data Obtained by Smartphones through Machine Learning

    PubMed Central

    Kim, Yong-Hyuk; Ha, Ji-Hun; Kim, Na-Young; Im, Hyo-Hyuc; Sim, Sangjin; Choi, Reno K. Y.

    2016-01-01

    A correction method using machine learning aims to improve the conventional linear regression (LR) based method for correction of atmospheric pressure data obtained by smartphones. The method proposed in this study conducts clustering and regression analysis with time domain classification. Data obtained in Gyeonggi-do, one of the most populous provinces in South Korea surrounding Seoul with the size of 10,000 km2, from July 2014 through December 2014, using smartphones were classified with respect to time of day (daytime or nighttime) as well as day of the week (weekday or weekend) and the user's mobility, prior to the expectation-maximization (EM) clustering. Subsequently, the results were analyzed for comparison by applying machine learning methods such as multilayer perceptron (MLP) and support vector regression (SVR). The results showed a mean absolute error (MAE) 26% lower on average when regression analysis was performed through EM clustering compared to that obtained without EM clustering. For machine learning methods, the MAE for SVR was around 31% lower for LR and about 19% lower for MLP. It is concluded that pressure data from smartphones are as good as the ones from national automatic weather station (AWS) network. PMID:27524999

  6. Comparing Machine Learning Classifiers and Linear/Logistic Regression to Explore the Relationship between Hand Dimensions and Demographic Characteristics

    PubMed Central

    2016-01-01

    Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications. PMID:27806075

  7. Comparing Machine Learning Classifiers and Linear/Logistic Regression to Explore the Relationship between Hand Dimensions and Demographic Characteristics.

    PubMed

    Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue

    2016-01-01

    Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.

  8. Comparison of various error functions in predicting the optimum isotherm by linear and non-linear regression analysis for the sorption of basic red 9 by activated carbon.

    PubMed

    Kumar, K Vasanth; Porkodi, K; Rocha, F

    2008-01-15

    A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of basic red 9 sorption by activated carbon. The r(2) was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions namely coefficient of determination (r(2)), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), the average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. Non-linear regression was found to be a better way to obtain the parameters involved in the isotherms and also the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r(2) was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K(2) was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm.

  9. Applied Multiple Linear Regression: A General Research Strategy

    ERIC Educational Resources Information Center

    Smith, Brandon B.

    1969-01-01

    Illustrates some of the basic concepts and procedures for using regression analysis in experimental design, analysis of variance, analysis of covariance, and curvilinear regression. Applications to evaluation of instruction and vocational education programs are illustrated. (GR)

  10. Solving large mixed linear models using preconditioned conjugate gradient iteration.

    PubMed

    Strandén, I; Lidauer, M

    1999-12-01

    Continuous evaluation of dairy cattle with a random regression test-day model requires a fast solving method and algorithm. A new computing technique feasible in Jacobi and conjugate gradient based iterative methods using iteration on data is presented. In the new computing technique, the calculations in multiplication of a vector by a matrix were recorded to three steps instead of the commonly used two steps. The three-step method was implemented in a general mixed linear model program that used preconditioned conjugate gradient iteration. Performance of this program in comparison to other general solving programs was assessed via estimation of breeding values using univariate, multivariate, and random regression test-day models. Central processing unit time per iteration with the new three-step technique was, at best, one-third that needed with the old technique. Performance was best with the test-day model, which was the largest and most complex model used. The new program did well in comparison to other general software. Programs keeping the mixed model equations in random access memory required at least 20 and 435% more time to solve the univariate and multivariate animal models, respectively. Computations of the second best iteration on data took approximately three and five times longer for the animal and test-day models, respectively, than did the new program. Good performance was due to fast computing time per iteration and quick convergence to the final solutions. Use of preconditioned conjugate gradient based methods in solving large breeding value problems is supported by our findings.

  11. Modeling when and where a secondary accident occurs.

    PubMed

    Wang, Junhua; Liu, Boya; Fu, Ting; Liu, Shuo; Stipancic, Joshua

    2018-01-31

    The occurrence of secondary accidents leads to traffic congestion and road safety issues. Secondary accident prevention has become a major consideration in traffic incident management. This paper investigates the location and time of a potential secondary accident after the occurrence of an initial traffic accident. With accident data and traffic loop data collected over three years from California interstate freeways, a shock wave-based method was introduced to identify secondary accidents. A linear regression model and two machine learning algorithms, including a back-propagation neural network (BPNN) and a least squares support vector machine (LSSVM), were implemented to explore the distance and time gap between the initial and secondary accidents using inputs of crash severity, violation category, weather condition, tow away, road surface condition, lighting, parties involved, traffic volume, duration, and shock wave speed generated by the primary accident. From the results, the linear regression model was inadequate in describing the effect of most variables and its goodness-of-fit and accuracy in prediction was relatively poor. In the training programs, the BPNN and LSSVM demonstrated adequate goodness-of-fit, though the BPNN was superior with a higher CORR and lower MSE. The BPNN model also outperformed the LSSVM in time prediction, while both failed to provide adequate distance prediction. Therefore, the BPNN model could be used to forecast the time gap between initial and secondary accidents, which could be used by decision makers and incident management agencies to prevent or reduce secondary collisions. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. Rapid and safe learning of robotic gastrectomy for gastric cancer: multidimensional analysis in a comparison with laparoscopic gastrectomy.

    PubMed

    Kim, H-I; Park, M S; Song, K J; Woo, Y; Hyung, W J

    2014-10-01

    The learning curve of robotic gastrectomy has not yet been evaluated in comparison with the laparoscopic approach. We compared the learning curves of robotic gastrectomy and laparoscopic gastrectomy based on operation time and surgical success. We analyzed 172 robotic and 481 laparoscopic distal gastrectomies performed by single surgeon from May 2003 to April 2009. The operation time was analyzed using a moving average and non-linear regression analysis. Surgical success was evaluated by a cumulative sum plot with a target failure rate of 10%. Surgical failure was defined as laparoscopic or open conversion, insufficient lymph node harvest for staging, resection margin involvement, postoperative morbidity, and mortality. Moving average and non-linear regression analyses indicated stable state for operation time at 95 and 121 cases in robotic gastrectomy, and 270 and 262 cases in laparoscopic gastrectomy, respectively. The cumulative sum plot identified no cut-off point for surgical success in robotic gastrectomy and 80 cases in laparoscopic gastrectomy. Excluding the initial 148 laparoscopic gastrectomies that were performed before the first robotic gastrectomy, the two groups showed similar number of cases to reach steady state in operation time, and showed no cut-off point in analysis of surgical success. The experience of laparoscopic surgery could affect the learning process of robotic gastrectomy. An experienced laparoscopic surgeon requires fewer cases of robotic gastrectomy to reach steady state. Moreover, the surgical outcomes of robotic gastrectomy were satisfactory. Copyright © 2013 Elsevier Ltd. All rights reserved.

  13. Estimate the contribution of incubation parameters influence egg hatchability using multiple linear regression analysis

    PubMed Central

    Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma

    2016-01-01

    Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens. PMID:27651666

  14. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    PubMed

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Patterns of medicinal plant use: an examination of the Ecuadorian Shuar medicinal flora using contingency table and binomial analyses.

    PubMed

    Bennett, Bradley C; Husby, Chad E

    2008-03-28

    Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.

  16. An Application to the Prediction of LOD Change Based on General Regression Neural Network

    NASA Astrophysics Data System (ADS)

    Zhang, X. H.; Wang, Q. J.; Zhu, J. J.; Zhang, H.

    2011-07-01

    Traditional prediction of the LOD (length of day) change was based on linear models, such as the least square model and the autoregressive technique, etc. Due to the complex non-linear features of the LOD variation, the performances of the linear model predictors are not fully satisfactory. This paper applies a non-linear neural network - general regression neural network (GRNN) model to forecast the LOD change, and the results are analyzed and compared with those obtained with the back propagation neural network and other models. The comparison shows that the performance of the GRNN model in the prediction of the LOD change is efficient and feasible.

  17. Solving a mixture of many random linear equations by tensor decomposition and alternating minimization.

    DOT National Transportation Integrated Search

    2016-09-01

    We consider the problem of solving mixed random linear equations with k components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample...

  18. An overview of longitudinal data analysis methods for neurological research.

    PubMed

    Locascio, Joseph J; Atri, Alireza

    2011-01-01

    The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models.

  19. Evaluation of the CEAS model for barley yields in North Dakota and Minnesota

    NASA Technical Reports Server (NTRS)

    Barnett, T. L. (Principal Investigator)

    1981-01-01

    The CEAS yield model is based upon multiple regression analysis at the CRD and state levels. For the historical time series, yield is regressed on a set of variables derived from monthly mean temperature and monthly precipitation. Technological trend is represented by piecewise linear and/or quadriatic functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test (1970-79) demonstrated that biases are small and performance as indicated by the root mean square errors are acceptable for intended application, however, model response for individual years particularly unusual years, is not very reliable and shows some large errors. The model is objective, adequate, timely, simple and not costly. It considers scientific knowledge on a broad scale but not in detail, and does not provide a good current measure of modeled yield reliability.

  20. Linear regression techniques for use in the EC tracer method of secondary organic aerosol estimation

    NASA Astrophysics Data System (ADS)

    Saylor, Rick D.; Edgerton, Eric S.; Hartsell, Benjamin E.

    A variety of linear regression techniques and simple slope estimators are evaluated for use in the elemental carbon (EC) tracer method of secondary organic carbon (OC) estimation. Linear regression techniques based on ordinary least squares are not suitable for situations where measurement uncertainties exist in both regressed variables. In the past, regression based on the method of Deming [1943. Statistical Adjustment of Data. Wiley, London] has been the preferred choice for EC tracer method parameter estimation. In agreement with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], we find that in the limited case where primary non-combustion OC (OC non-comb) is assumed to be zero, the ratio of averages (ROA) approach provides a stable and reliable estimate of the primary OC-EC ratio, (OC/EC) pri. In contrast with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], however, we find that the optimal use of Deming regression (and the more general York et al. [2004. Unified equations for the slope, intercept, and standard errors of the best straight line. American Journal of Physics 72, 367-375] regression) provides excellent results as well. For the more typical case where OC non-comb is allowed to obtain a non-zero value, we find that regression based on the method of York is the preferred choice for EC tracer method parameter estimation. In the York regression technique, detailed information on uncertainties in the measurement of OC and EC is used to improve the linear best fit to the given data. If only limited information is available on the relative uncertainties of OC and EC, then Deming regression should be used. On the other hand, use of ROA in the estimation of secondary OC, and thus the assumption of a zero OC non-comb value, generally leads to an overestimation of the contribution of secondary OC to total measured OC.

  1. Hybrid Support Vector Regression and Autoregressive Integrated Moving Average Models Improved by Particle Swarm Optimization for Property Crime Rates Forecasting with Economic Indicators

    PubMed Central

    Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina

    2013-01-01

    Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729

  2. Internet gaming disorder in early adolescence: Associations with parental and adolescent mental health.

    PubMed

    Wartberg, L; Kriston, L; Kramer, M; Schwedler, A; Lincoln, T M; Kammerl, R

    2017-06-01

    Internet gaming disorder (IGD) has been included in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5). Currently, associations between IGD in early adolescence and mental health are largely unexplained. In the present study, the relation of IGD with adolescent and parental mental health was investigated for the first time. We surveyed 1095 family dyads (an adolescent aged 12-14 years and a related parent) with a standardized questionnaire for IGD as well as for adolescent and parental mental health. We conducted linear (dimensional approach) and logistic (categorical approach) regression analyses. Both with dimensional and categorical approaches, we observed statistically significant associations between IGD and male gender, a higher degree of adolescent antisocial behavior, anger control problems, emotional distress, self-esteem problems, hyperactivity/inattention and parental anxiety (linear regression model: corrected R 2 =0.41, logistic regression model: Nagelkerke's R 2 =0.41). IGD appears to be associated with internalizing and externalizing problems in adolescents. Moreover, the findings of the present study provide first evidence that not only adolescent but also parental mental health is relevant to IGD in early adolescence. Adolescent and parental mental health should be considered in prevention and intervention programs for IGD in adolescence. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  3. Hybrid support vector regression and autoregressive integrated moving average models improved by particle swarm optimization for property crime rates forecasting with economic indicators.

    PubMed

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina

    2013-01-01

    Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.

  4. Hypothesis testing in functional linear regression models with Neyman's truncation and wavelet thresholding for longitudinal data.

    PubMed

    Yang, Xiaowei; Nie, Kun

    2008-03-15

    Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.

  5. Development of non-linear models predicting daily fine particle concentrations using aerosol optical depth retrievals and ground-based measurements at a municipality in the Brazilian Amazon region

    NASA Astrophysics Data System (ADS)

    Gonçalves, Karen dos Santos; Winkler, Mirko S.; Benchimol-Barbosa, Paulo Roberto; de Hoogh, Kees; Artaxo, Paulo Eduardo; de Souza Hacon, Sandra; Schindler, Christian; Künzli, Nino

    2018-07-01

    Epidemiological studies generally use particulate matter measurements with diameter less 2.5 μm (PM2.5) from monitoring networks. Satellite aerosol optical depth (AOD) data has considerable potential in predicting PM2.5 concentrations, and thus provides an alternative method for producing knowledge regarding the level of pollution and its health impact in areas where no ground PM2.5 measurements are available. This is the case in the Brazilian Amazon rainforest region where forest fires are frequent sources of high pollution. In this study, we applied a non-linear model for predicting PM2.5 concentration from AOD retrievals using interaction terms between average temperature, relative humidity, sine, cosine of date in a period of 365,25 days and the square of the lagged relative residual. Regression performance statistics were tested comparing the goodness of fit and R2 based on results from linear regression and non-linear regression for six different models. The regression results for non-linear prediction showed the best performance, explaining on average 82% of the daily PM2.5 concentrations when considering the whole period studied. In the context of Amazonia, it was the first study predicting PM2.5 concentrations using the latest high-resolution AOD products also in combination with the testing of a non-linear model performance. Our results permitted a reliable prediction considering the AOD-PM2.5 relationship and set the basis for further investigations on air pollution impacts in the complex context of Brazilian Amazon Region.

  6. Panel regressions to estimate low-flow response to rainfall variability in ungaged basins

    USGS Publications Warehouse

    Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.

    2016-01-01

    Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.

  7. Panel regressions to estimate low-flow response to rainfall variability in ungaged basins

    NASA Astrophysics Data System (ADS)

    Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.

    2016-12-01

    Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.

  8. Multiplicative Forests for Continuous-Time Processes

    PubMed Central

    Weiss, Jeremy C.; Natarajan, Sriraam; Page, David

    2013-01-01

    Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows exponentially in the number of parents per variable. We develop a partition-based representation using regression trees and forests whose parameter spaces grow linearly in the number of node splits. Using a multiplicative assumption we show how to update the forest likelihood in closed form, producing efficient model updates. Our results show multiplicative forests can be learned from few temporal trajectories with large gains in performance and scalability. PMID:25284967

  9. Multiplicative Forests for Continuous-Time Processes.

    PubMed

    Weiss, Jeremy C; Natarajan, Sriraam; Page, David

    2012-01-01

    Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows exponentially in the number of parents per variable. We develop a partition-based representation using regression trees and forests whose parameter spaces grow linearly in the number of node splits. Using a multiplicative assumption we show how to update the forest likelihood in closed form, producing efficient model updates. Our results show multiplicative forests can be learned from few temporal trajectories with large gains in performance and scalability.

  10. Tool for Forecasting Cool-Season Peak Winds Across Kennedy Space Center and Cape Canaveral Air Force Station

    NASA Technical Reports Server (NTRS)

    Barrett, Joe H., III; Roeder, William P.

    2010-01-01

    The expected peak wind speed for the day is an important element in the daily morning forecast for ground and space launch operations at Kennedy Space Center (KSC) and Cape Canaveral Air Force Station (CCAFS). The 45th Weather Squadron (45 WS) must issue forecast advisories for KSC/CCAFS when they expect peak gusts for >= 25, >= 35, and >= 50 kt thresholds at any level from the surface to 300 ft. In Phase I of this task, the 45 WS tasked the Applied Meteorology Unit (AMU) to develop a cool-season (October - April) tool to help forecast the non-convective peak wind from the surface to 300 ft at KSC/CCAFS. During the warm season, these wind speeds are rarely exceeded except during convective winds or under the influence of tropical cyclones, for which other techniques are already in use. The tool used single and multiple linear regression equations to predict the peak wind from the morning sounding. The forecaster manually entered several observed sounding parameters into a Microsoft Excel graphical user interface (GUI), and then the tool displayed the forecast peak wind speed, average wind speed at the time of the peak wind, the timing of the peak wind and the probability the peak wind will meet or exceed 35, 50 and 60 kt. The 45 WS customers later dropped the requirement for >= 60 kt wind warnings. During Phase II of this task, the AMU expanded the period of record (POR) by six years to increase the number of observations used to create the forecast equations. A large number of possible predictors were evaluated from archived soundings, including inversion depth and strength, low-level wind shear, mixing height, temperature lapse rate and winds from the surface to 3000 ft. Each day in the POR was stratified in a number of ways, such as by low-level wind direction, synoptic weather pattern, precipitation and Bulk Richardson number. The most accurate Phase II equations were then selected for an independent verification. The Phase I and II forecast methods were compared using an independent verification data set. The two methods were compared to climatology, wind warnings and advisories issued by the 45 WS, and North American Mesoscale (NAM) model (MesoNAM) forecast winds. The performance of the Phase I and II methods were similar with respect to mean absolute error. Since the Phase I data were not stratified by precipitation, this method's peak wind forecasts had a large negative bias on days with precipitation and a small positive bias on days with no precipitation. Overall, the climatology methods performed the worst while the MesoNAM performed the best. Since the MesoNAM winds were the most accurate in the comparison, the final version of the tool was based on the MesoNAM winds. The probability the peak wind will meet or exceed the warning thresholds were based on the one standard deviation error bars from the linear regression. For example, the linear regression might forecast the most likely peak speed to be 35 kt and the error bars used to calculate that the probability of >= 25 kt = 76%, the probability of >= 35 kt = 50%, and the probability of >= 50 kt = 19%. The authors have not seen this application of linear regression error bars in any other meteorological applications. Although probability forecast tools should usually be developed with logistic regression, this technique could be easily generalized to any linear regression forecast tool to estimate the probability of exceeding any desired threshold . This could be useful for previously developed linear regression forecast tools or new forecast applications where statistical analysis software to perform logistic regression is not available. The tool was delivered in two formats - a Microsoft Excel GUI and a Tool Command Language/Tool Kit (Tcl/Tk) GUI in the Meteorological Interactive Data Display System (MIDDS). The Microsoft Excel GUI reads a MesoNAM text file containing hourly forecasts from 0 to 84 hours, from one model run (00 or 12 UTC). The GUI then displays e peak wind speed, average wind speed, and the probability the peak wind will meet or exceed the 25-, 35- and 50-kt thresholds. The user can display the Day-1 through Day-3 peak wind forecasts, and separate forecasts are made for precipitation and non-precipitation days. The MIDDS GUI uses data from the NAM and Global Forecast System (GFS), instead of the MesoNAM. It can display Day-1 and Day-2 forecasts using NAM data, and Day-1 through Day-5 forecasts using GFS data. The timing of the peak wind is not displayed, since the independent verification showed that none of the forecast methods performed significantly better than climatology. The forecaster should use the climatological timing of the peak wind (2248 UTC) as a first guess and then adjust it based on the movement of weather features.

  11. Relationship between masticatory performance using a gummy jelly and masticatory movement.

    PubMed

    Uesugi, Hanako; Shiga, Hiroshi

    2017-10-01

    The purpose of this study was to clarify the relationship between masticatory performance using a gummy jelly and masticatory movement. Thirty healthy males were asked to chew a gummy jelly on their habitual chewing side for 20s, and the parameters of masticatory performance and masticatory movement were calculated as follows. For evaluating the masticatory performance, the amount of glucose extraction during chewing of a gummy jelly was measured. For evaluating the masticatory movement, the movement of the mandibular incisal point was recorded using the MKG K6-I, and ten parameters of the movement path (opening distance and masticatory width), movement rhythm (opening time, closing time, occluding time, and cycle time), stability of movement (stability of path and stability of rhythm), and movement velocity (opening maximum velocity and closing maximum velocity) were calculated from 10 cycles of chewing beginning with the fifth cycle. The relationship between the amount of glucose extraction and parameters representing masticatory movement was investigated and then stepwise multiple linear regression analysis was performed. The amount of glucose extraction was associated with 7 parameters representing the masticatory movement. Stepwise multiple linear regression analysis showed that the opening distance, closing time, stability of rhythm, and closing maximum velocity were the most important factors affecting the glucose extraction. From these results it was suggested that there was a close relation between masticatory performance and masticatory movement, and that the masticatory performance could be increased by rhythmic, rapid and stable mastication with a large opening distance. Copyright © 2017 Japan Prosthodontic Society. Published by Elsevier Ltd. All rights reserved.

  12. Data-driven discovery of partial differential equations.

    PubMed

    Rudy, Samuel H; Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan

    2017-04-01

    We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.

  13. Stratification for the propensity score compared with linear regression techniques to assess the effect of treatment or exposure.

    PubMed

    Senn, Stephen; Graf, Erika; Caputo, Angelika

    2007-12-30

    Stratifying and matching by the propensity score are increasingly popular approaches to deal with confounding in medical studies investigating effects of a treatment or exposure. A more traditional alternative technique is the direct adjustment for confounding in regression models. This paper discusses fundamental differences between the two approaches, with a focus on linear regression and propensity score stratification, and identifies points to be considered for an adequate comparison. The treatment estimators are examined for unbiasedness and efficiency. This is illustrated in an application to real data and supplemented by an investigation on properties of the estimators for a range of underlying linear models. We demonstrate that in specific circumstances the propensity score estimator is identical to the effect estimated from a full linear model, even if it is built on coarser covariate strata than the linear model. As a consequence the coarsening property of the propensity score-adjustment for a one-dimensional confounder instead of a high-dimensional covariate-may be viewed as a way to implement a pre-specified, richly parametrized linear model. We conclude that the propensity score estimator inherits the potential for overfitting and that care should be taken to restrict covariates to those relevant for outcome. Copyright (c) 2007 John Wiley & Sons, Ltd.

  14. [Optimal extraction of effective constituents from Aralia elata by central composite design and response surface methodology].

    PubMed

    Lv, Shao-Wa; Liu, Dong; Hu, Pan-Pan; Ye, Xu-Yan; Xiao, Hong-Bin; Kuang, Hai-Xue

    2010-03-01

    To optimize the process of extracting effective constituents from Aralia elata by response surface methodology. The independent variables were ethanol concentration, reflux time and solvent fold, the dependent variable was extraction rate of total saponins in Aralia elata. Linear or no-linear mathematic models were used to estimate the relationship between independent and dependent variables. Response surface methodology was used to optimize the process of extraction. The prediction was carried out through comparing the observed and predicted values. Regression coefficient of binomial fitting complex model was as high as 0.9617, the optimum conditions of extraction process were 70% ethanol, 2.5 hours for reflux, 20-fold solvent and 3 times for extraction. The bias between observed and predicted values was -2.41%. It shows the optimum model is highly predictive.

  15. Method selection and adaptation for distributed monitoring of infectious diseases for syndromic surveillance.

    PubMed

    Xing, Jian; Burkom, Howard; Tokars, Jerome

    2011-12-01

    Automated surveillance systems require statistical methods to recognize increases in visit counts that might indicate an outbreak. In prior work we presented methods to enhance the sensitivity of C2, a commonly used time series method. In this study, we compared the enhanced C2 method with five regression models. We used emergency department chief complaint data from US CDC BioSense surveillance system, aggregated by city (total of 206 hospitals, 16 cities) during 5/2008-4/2009. Data for six syndromes (asthma, gastrointestinal, nausea and vomiting, rash, respiratory, and influenza-like illness) was used and was stratified by mean count (1-19, 20-49, ≥50 per day) into 14 syndrome-count categories. We compared the sensitivity for detecting single-day artificially-added increases in syndrome counts. Four modifications of the C2 time series method, and five regression models (two linear and three Poisson), were tested. A constant alert rate of 1% was used for all methods. Among the regression models tested, we found that a Poisson model controlling for the logarithm of total visits (i.e., visits both meeting and not meeting a syndrome definition), day of week, and 14-day time period was best. Among 14 syndrome-count categories, time series and regression methods produced approximately the same sensitivity (<5% difference) in 6; in six categories, the regression method had higher sensitivity (range 6-14% improvement), and in two categories the time series method had higher sensitivity. When automated data are aggregated to the city level, a Poisson regression model that controls for total visits produces the best overall sensitivity for detecting artificially added visit counts. This improvement was achieved without increasing the alert rate, which was held constant at 1% for all methods. These findings will improve our ability to detect outbreaks in automated surveillance system data. Published by Elsevier Inc.

  16. Understanding Child Stunting in India: A Comprehensive Analysis of Socio-Economic, Nutritional and Environmental Determinants Using Additive Quantile Regression

    PubMed Central

    Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.

    2013-01-01

    Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839

  17. Understanding child stunting in India: a comprehensive analysis of socio-economic, nutritional and environmental determinants using additive quantile regression.

    PubMed

    Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A

    2013-01-01

    Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.

  18. Visual field progression in glaucoma: estimating the overall significance of deterioration with permutation analyses of pointwise linear regression (PoPLR).

    PubMed

    O'Leary, Neil; Chauhan, Balwantray C; Artes, Paul H

    2012-10-01

    To establish a method for estimating the overall statistical significance of visual field deterioration from an individual patient's data, and to compare its performance to pointwise linear regression. The Truncated Product Method was used to calculate a statistic S that combines evidence of deterioration from individual test locations in the visual field. The overall statistical significance (P value) of visual field deterioration was inferred by comparing S with its permutation distribution, derived from repeated reordering of the visual field series. Permutation of pointwise linear regression (PoPLR) and pointwise linear regression were evaluated in data from patients with glaucoma (944 eyes, median mean deviation -2.9 dB, interquartile range: -6.3, -1.2 dB) followed for more than 4 years (median 10 examinations over 8 years). False-positive rates were estimated from randomly reordered series of this dataset, and hit rates (proportion of eyes with significant deterioration) were estimated from the original series. The false-positive rates of PoPLR were indistinguishable from the corresponding nominal significance levels and were independent of baseline visual field damage and length of follow-up. At P < 0.05, the hit rates of PoPLR were 12, 29, and 42%, at the fifth, eighth, and final examinations, respectively, and at matching specificities they were consistently higher than those of pointwise linear regression. In contrast to population-based progression analyses, PoPLR provides a continuous estimate of statistical significance for visual field deterioration individualized to a particular patient's data. This allows close control over specificity, essential for monitoring patients in clinical practice and in clinical trials.

  19. A Model Comparison for Count Data with a Positively Skewed Distribution with an Application to the Number of University Mathematics Courses Completed

    ERIC Educational Resources Information Center

    Liou, Pey-Yan

    2009-01-01

    The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…

  20. Strengthen forensic entomology in court--the need for data exploration and the validation of a generalised additive mixed model.

    PubMed

    Baqué, Michèle; Amendt, Jens

    2013-01-01

    Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.

  1. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    PubMed

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  2. Linear regression based on Minimum Covariance Determinant (MCD) and TELBS methods on the productivity of phytoplankton

    NASA Astrophysics Data System (ADS)

    Gusriani, N.; Firdaniza

    2018-03-01

    The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.

  3. Evaluation of nonpoint-source contamination, Wisconsin; selected streamwater-quality data, land-use and best-management practices inventory, and quality assurance and quality control, water year 1993

    USGS Publications Warehouse

    Corsi, Steven R.; Walker, John F.; Graczyk, D.J.; Greb, S.R.; Owens, D.W.; Rappold, K.F.

    1995-01-01

    A special study was done to determine the effect of holding time on fecal coliform colony counts. A linear regression indicated that the mean decrease in colony counts over 72 hours was 8.2 percent per day. Results after 24 hours showed that colony counts increased in some samples and decreased in others.

  4. Spectral Regression Discriminant Analysis for Hyperspectral Image Classification

    NASA Astrophysics Data System (ADS)

    Pan, Y.; Wu, J.; Huang, H.; Liu, J.

    2012-08-01

    Dimensionality reduction algorithms, which aim to select a small set of efficient and discriminant features, have attracted great attention for Hyperspectral Image Classification. The manifold learning methods are popular for dimensionality reduction, such as Locally Linear Embedding, Isomap, and Laplacian Eigenmap. However, a disadvantage of many manifold learning methods is that their computations usually involve eigen-decomposition of dense matrices which is expensive in both time and memory. In this paper, we introduce a new dimensionality reduction method, called Spectral Regression Discriminant Analysis (SRDA). SRDA casts the problem of learning an embedding function into a regression framework, which avoids eigen-decomposition of dense matrices. Also, with the regression based framework, different kinds of regularizes can be naturally incorporated into our algorithm which makes it more flexible. It can make efficient use of data points to discover the intrinsic discriminant structure in the data. Experimental results on Washington DC Mall and AVIRIS Indian Pines hyperspectral data sets demonstrate the effectiveness of the proposed method.

  5. The association between meteorological factors and road traffic injuries: a case analysis from Shantou city, China

    PubMed Central

    Gao, Jinghong; Chen, Xiaojun; Woodward, Alistair; Liu, Xiaobo; Wu, Haixia; Lu, Yaogui; Li, Liping; Liu, Qiyong

    2016-01-01

    Few studies examined the associations of meteorological factors with road traffic injuries (RTIs). The purpose of the present study was to quantify the contributions of meteorological factors to RTI cases treated at a tertiary level hospital in Shantou city, China. A time-series diagram was employed to illustrate the time trends and seasonal variation of RTIs, and correlation analysis and multiple linear regression analysis were conducted to investigate the relationships between meteorological parameters and RTIs. RTIs followed a seasonal pattern as more cases occurred during summer and winter months. RTIs are positively correlated with temperature and sunshine duration, while negatively associated with wind speed. Temperature, sunshine hour and wind speed were included in the final linear model with regression coefficients of 0.65 (t = 2.36, P = 0.019), 2.23 (t = 2.72, P = 0.007) and −27.66 (t = −5.67, P < 0.001), respectively, accounting for 19.93% of the total variation of RTI cases. The findings can help us better understand the associations between meteorological factors and RTIs, and with potential contributions to the development and implementation of regional level evidence-based weather-responsive traffic management system in the future. PMID:27853316

  6. Distinguishing Gasoline Engine Oils of Different Viscosities Using Terahertz Time-Domain Spectroscopy

    NASA Astrophysics Data System (ADS)

    Adbul-Munaim, Ali Mazin; Reuter, Marco; Koch, Martin; Watson, Dennis G.

    2015-07-01

    Terahertz-time-domain spectroscopy (THz-TDS) in the range of 0.5-2.0 THz was evaluated for distinguishing among gasoline engine oils of three different grades (SAE 5W-20, 10W-40, and 20W-50) from the same manufacturer. Absorption coefficient showed limited potential and only distinguished ( p < 0.05) the 20W-50 grade from the other two grades in the 1.7-2.0-THz range. Refractive index data demonstrated relatively flat and consistently spaced curves for the three oil grades. ANOVA results confirmed a highly significant difference ( p < 0.0001) in refractive index among each of the three oils across the 0.5-2.0-THz range. Linear regression was applied to refractive index data at 0.25-THz intervals from 0.5 to 2.0 THz to predict kinematic viscosity. All seven linear regression models, intercepts, and refractive index coefficients were highly significant ( p < 0.0001). All models had a similar fit with R 2 ranging from 0.9773 to 0.9827 and RMSE ranging from 6.33 to 7.75. The refractive indices at 1.25 THz produced the best fit. The refractive indices of these oil samples were promising for identification and distinction of oil grades.

  7. Estimation of the quantification uncertainty from flow injection and liquid chromatography transient signals in inductively coupled plasma mass spectrometry

    NASA Astrophysics Data System (ADS)

    Laborda, Francisco; Medrano, Jesús; Castillo, Juan R.

    2004-06-01

    The quality of the quantitative results obtained from transient signals in high-performance liquid chromatography-inductively coupled plasma mass spectrometry (HPLC-ICPMS) and flow injection-inductively coupled plasma mass spectrometry (FI-ICPMS) was investigated under multielement conditions. Quantification methods were based on multiple-point calibration by simple and weighted linear regression, and double-point calibration (measurement of the baseline and one standard). An uncertainty model, which includes the main sources of uncertainty from FI-ICPMS and HPLC-ICPMS (signal measurement, sample flow rate and injection volume), was developed to estimate peak area uncertainties and statistical weights used in weighted linear regression. The behaviour of the ICPMS instrument was characterized in order to be considered in the model, concluding that the instrument works as a concentration detector when it is used to monitorize transient signals from flow injection or chromatographic separations. Proper quantification by the three calibration methods was achieved when compared to reference materials, although the double-point calibration allowed to obtain results of the same quality as the multiple-point calibration, shortening the calibration time. Relative expanded uncertainties ranged from 10-20% for concentrations around the LOQ to 5% for concentrations higher than 100 times the LOQ.

  8. Analyses of Field Test Data at the Atucha-1 Spent Fuel Pools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sitaraman, S.

    A field test was conducted at the Atucha-1 spent nuclear fuel pools to validate a software package for gross defect detection that is used in conjunction with the inspection tool, Spent Fuel Neutron Counter (SFNC). A set of measurements was taken with the SFNC and the software predictions were compared with these data and analyzed. The data spanned a wide range of cooling times and a set of burnup levels leading to count rates from the several hundreds to around twenty per second. The current calibration in the software using linear fitting required the use of multiple calibration factors tomore » cover the entire range of count rates recorded. The solution to this was to use power regression data fitting to normalize the predicted response and derive one calibration factor that can be applied to the entire set of data. The resulting comparisons between the predicted and measured responses were generally good and provided a quantitative method of detecting missing fuel in virtually all situations. Since the current version of the software uses the linear calibration method, it would need to be updated with the new power regression method to make it more user-friendly for real time verification and fieldable for the range of responses that will be encountered.« less

  9. Plotting of Ethylene Glycol Blood Concentrations Using Linear Regression before and during Hemodialysis in a Case of Intoxication and Pharmacokinetic Review.

    PubMed

    Kim, Youngho

    2015-01-01

    Introduction. As blood concentration measurement of commonly abused alcohol is readily available, the equation was proposed in previous publication to predict the change of their concentration. The change of ethylene glycol (EG) concentrations was studied in a case of intoxication to estimate required time for hemodialysis (HD) using linear regression. Case Report. A 55-year-old female with past medical history of seizure disorder, bipolar disorder, and chronic pain was admitted due to severe agitation. The patient was noted to have metabolic acidosis with elevated anion gap and acute kidney injury, which prompted blood concentration measurement of commonly abused alcohol. Her initial EG concentration was 26.45 mmol/L. Fomepizole therapy was initiated, soon followed by HD to enhance clearance. Discussion. Plotting of natural logarithm of EG concentrations over time showed that EG elimination follows first-order kinetics and predicts the change of its concentration well. Pharmacokinetic review revealed minimal elimination of EG by alcohol dehydrogenase (ADH) which could be related to genetic predisposition for ADH activity and home medications as well as presence of propylene glycol. Pharmacokinetics of EG is relatively well studied with published parameters. Consideration and application of pharmacokinetics could assist in management of EG intoxication including HD planning.

  10. Emotional exhaustion and cognitive performance in apparently healthy teachers: a longitudinal multi-source study.

    PubMed

    Feuerhahn, Nicolas; Stamov-Roßnagel, Christian; Wolfram, Maren; Bellingrath, Silja; Kudielka, Brigitte M

    2013-10-01

    We investigate how emotional exhaustion (EE), the core component of burnout, relates to cognitive performance, job performance and health. Cognitive performance was assessed by self-rated cognitive stress symptoms, self-rated and peer-rated cognitive impairments in everyday tasks and a neuropsychological test of learning and memory (LGT-3); job performance and physical health were gauged by self-reports. Cross-sectional linear regression analyses in a sample of 100 teachers confirm that EE is negatively related to cognitive performance as assessed by self-rating and peer-rating as well as neuropsychological testing (all p < .05). Longitudinal linear regression analyses confirm similar trends (p < .10) for self-rated and peer-rated cognitive performance. Executive control deficits might explain impaired cognitive performance in EE. In longitudinal analyses, EE also significantly predicts physical health. Contrary to our expectations, EE does not affect job performance. When reversed causation is tested, none of the outcome variables at Time 1 predict EE at Time 2. This speaks against cognitive dysfunctioning serving as a vulnerability factor for exhaustion. In sum, results underpin the negative consequences of EE for cognitive performance and health, which are relevant for individuals and organizations alike. In this way, findings might contribute to the understanding of the burnout syndrome. Copyright © 2012 John Wiley & Sons, Ltd.

  11. Orthogonal Projection in Teaching Regression and Financial Mathematics

    ERIC Educational Resources Information Center

    Kachapova, Farida; Kachapov, Ilias

    2010-01-01

    Two improvements in teaching linear regression are suggested. The first is to include the population regression model at the beginning of the topic. The second is to use a geometric approach: to interpret the regression estimate as an orthogonal projection and the estimation error as the distance (which is minimized by the projection). Linear…

  12. Logistic models--an odd(s) kind of regression.

    PubMed

    Jupiter, Daniel C

    2013-01-01

    The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  13. No increase in small-solute transport in peritoneal dialysis patients treated without hypertonic glucose for fifty-four months.

    PubMed

    Pagniez, Dominique; Duhamel, Alain; Boulanger, Eric; Lessore de Sainte Foy, Celia; Beuscart, Jean-Baptiste

    2017-08-31

    Glucose is widely used as an osmotic agent in peritoneal dialysis (PD), but exerts untoward effects on the peritoneum. The potential protective effect of a reduced exposure to hypertonic glucose has never been investigated. The cohort of PD patients attending our center which tackled the challenge of a restricted use of hypertonic glucose solutions has been prospectively followed since 1992. Small-solute transport was assessed using an equivalent of the glucose peritoneal equilibration test after 6 months, and then every year. Study was stopped on July 1st, 2008, before use of biocompatible solutions. Repeated measures in patients treated with PD for 54 months were analyzed by using (1) the slopes of the linear regression for D 4 /D 0 ratios over time computed for each individual, and (2) a linear mixed model. In the study period, 44 patients were treated for a total of 2376 months, 2058 without hypertonic glucose. There was one episode of peritoneal infection every 18 patient-months. The mean of slopes of the linear regression for D 4 /D 0 ratios was found to be significantly positive (Student's test, p < .001) and the results of the mixed model reflected a similar significant increase for D 4 /D 0 ratios over time. These results reflected a significant decrease of small-solute transport. In this large series, minimizing the use of hypertonic glucose solutions was associated in patients on long term PD with an overall decrease of small-solute transport within 54 months, despite a high rate of peritoneal infection.

  14. Development and Application of Nonlinear Land-Use Regression Models

    NASA Astrophysics Data System (ADS)

    Champendal, Alexandre; Kanevski, Mikhail; Huguenot, Pierre-Emmanuel

    2014-05-01

    The problem of air pollution modelling in urban zones is of great importance both from scientific and applied points of view. At present there are several fundamental approaches either based on science-based modelling (air pollution dispersion) or on the application of space-time geostatistical methods (e.g. family of kriging models or conditional stochastic simulations). Recently, there were important developments in so-called Land Use Regression (LUR) models. These models take into account geospatial information (e.g. traffic network, sources of pollution, average traffic, population census, land use, etc.) at different scales, for example, using buffering operations. Usually the dimension of the input space (number of independent variables) is within the range of (10-100). It was shown that LUR models have some potential to model complex and highly variable patterns of air pollution in urban zones. Most of LUR models currently used are linear models. In the present research the nonlinear LUR models are developed and applied for Geneva city. Mainly two nonlinear data-driven models were elaborated: multilayer perceptron and random forest. An important part of the research deals also with a comprehensive exploratory data analysis using statistical, geostatistical and time series tools. Unsupervised self-organizing maps were applied to better understand space-time patterns of the pollution. The real data case study deals with spatial-temporal air pollution data of Geneva (2002-2011). Nitrogen dioxide (NO2) has caught our attention. It has effects on human health and on plants; NO2 contributes to the phenomenon of acid rain. The negative effects of nitrogen dioxides on plants are the reduction of the growth, production and pesticide resistance. And finally, the effects on materials: nitrogen dioxide increases the corrosion. The data used for this study consist of a set of 106 NO2 passive sensors. 80 were used to build the models and the remaining 36 have constituted the testing set. Missing data have been completed using multiple linear regression and annual average values of pollutant concentrations were computed. All sensors are dispersed homogeneously over the central urban area of Geneva. The main result of the study is that the nonlinear LUR models developed have demonstrated their efficiency in modelling complex phrenomena of air pollution in urban zones and significantly reduced the testing error in comparison with linear models. Further research deals with the development and application of other non-linear data-driven models (Kanevski et al. 2009). References Kanevski M., Pozdnoukhov A. and Timonin V. (2009). Machine Learning for Spatial Environmental Data. Theory, Applications and Software. EPLF Press, Lausanne.

  15. Analysis of Learning Curve Fitting Techniques.

    DTIC Science & Technology

    1987-09-01

    1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied

  16. On vertical profile of ozone at Syowa

    NASA Technical Reports Server (NTRS)

    Chubachi, Shigeru

    1994-01-01

    The difference in the vertical ozone profile at Syowa between 1966-1981 and 1982-1988 is shown. The month-height cross section of the slope of the linear regressions between ozone partial pressure and 100-mb temperature is also shown. The vertically integrated values of the slopes are in close agreement with the slopes calculated by linear regression of Dobson total ozone on 100-mb temperature in the period of 1982-1988.

  17. Binding affinity toward human prion protein of some anti-prion compounds - Assessment based on QSAR modeling, molecular docking and non-parametric ranking.

    PubMed

    Kovačević, Strahinja; Karadžić, Milica; Podunavac-Kuzmanović, Sanja; Jevrić, Lidija

    2018-01-01

    The present study is based on the quantitative structure-activity relationship (QSAR) analysis of binding affinity toward human prion protein (huPrP C ) of quinacrine, pyridine dicarbonitrile, diphenylthiazole and diphenyloxazole analogs applying different linear and non-linear chemometric regression techniques, including univariate linear regression, multiple linear regression, partial least squares regression and artificial neural networks. The QSAR analysis distinguished molecular lipophilicity as an important factor that contributes to the binding affinity. Principal component analysis was used in order to reveal similarities or dissimilarities among the studied compounds. The analysis of in silico absorption, distribution, metabolism, excretion and toxicity (ADMET) parameters was conducted. The ranking of the studied analogs on the basis of their ADMET parameters was done applying the sum of ranking differences, as a relatively new chemometric method. The main aim of the study was to reveal the most important molecular features whose changes lead to the changes in the binding affinities of the studied compounds. Another point of view on the binding affinity of the most promising analogs was established by application of molecular docking analysis. The results of the molecular docking were proven to be in agreement with the experimental outcome. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Classification of sodium MRI data of cartilage using machine learning.

    PubMed

    Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R

    2015-11-01

    To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.

  19. Does Nonlinear Modeling Play a Role in Plasmid Bioprocess Monitoring Using Fourier Transform Infrared Spectra?

    PubMed

    Lopes, Marta B; Calado, Cecília R C; Figueiredo, Mário A T; Bioucas-Dias, José M

    2017-06-01

    The monitoring of biopharmaceutical products using Fourier transform infrared (FT-IR) spectroscopy relies on calibration techniques involving the acquisition of spectra of bioprocess samples along the process. The most commonly used method for that purpose is partial least squares (PLS) regression, under the assumption that a linear model is valid. Despite being successful in the presence of small nonlinearities, linear methods may fail in the presence of strong nonlinearities. This paper studies the potential usefulness of nonlinear regression methods for predicting, from in situ near-infrared (NIR) and mid-infrared (MIR) spectra acquired in high-throughput mode, biomass and plasmid concentrations in Escherichia coli DH5-α cultures producing the plasmid model pVAX-LacZ. The linear methods PLS and ridge regression (RR) are compared with their kernel (nonlinear) versions, kPLS and kRR, as well as with the (also nonlinear) relevance vector machine (RVM) and Gaussian process regression (GPR). For the systems studied, RR provided better predictive performances compared to the remaining methods. Moreover, the results point to further investigation based on larger data sets whenever differences in predictive accuracy between a linear method and its kernelized version could not be found. The use of nonlinear methods, however, shall be judged regarding the additional computational cost required to tune their additional parameters, especially when the less computationally demanding linear methods herein studied are able to successfully monitor the variables under study.

  20. Developing a dengue forecast model using machine learning: A case study in China.

    PubMed

    Guo, Pi; Liu, Tao; Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun

    2017-10-01

    In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011-2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics.

Top