Testing hypotheses for differences between linear regression lines
Stanley J. Zarnoch
2009-01-01
Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
ERIC Educational Resources Information Center
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
NASA Astrophysics Data System (ADS)
Karan, S.; Sebok, E.; Engesgaard, P. K.
2016-12-01
For identifying groundwater seepage locations in small streams within a headwater catchment, we present a method expanding on the linear regression of air and stream temperatures. Thus, by measuring the temperatures in dual-depth; in the stream column and at the streambed-water interface (SWI), we apply metrics from linear regression analysis of temperatures between air/stream and air/SWI (linear regression slope, intercept and coefficient of determination), and the daily mean temperatures (temperature variance and the average difference between the minimum and maximum daily temperatures). Our study show that using metrics from single-depth stream temperature measurements only are not sufficient to identify substantial groundwater seepage locations within a headwater stream. Conversely, comparing the metrics from dual-depth temperatures show significant differences so that at groundwater seepage locations, temperatures at the SWI, merely explain 43-75 % of the variation opposed to ≥91 % at the corresponding stream column temperatures. The figure showing a box-plot of the variation in daily mean temperature depict that at several locations there is great variation in the range the upper and lower loggers due to groundwater seepage. In general, the linear regression show that at these locations at the SWI, the slopes (<0.25) and intercepts (>6.5oC) are substantially lower and higher, while the mean diel amplitudes (<0.98oC) are decreased compared to remaining locations. The dual-depth approach was applied in a post-glacial fluvial setting, where metrics analyses overall corresponded to field measurements of groundwater fluxes deduced from vertical streambed temperatures and stream flow accretions. Thus, we propose a method reliably identifying groundwater seepage locations along streambed in such settings.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications. PMID:27806075
Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.
Analysis of reciprocal creatinine plots by two-phase linear regression.
Rowe, P A; Richardson, R E; Burton, P R; Morgan, A G; Burden, R P
1989-01-01
The progression of renal diseases is often monitored by the serial measurement of plasma creatinine. The slope of the linear relation that is frequently found between the reciprocal of creatinine concentration and time delineates the rate of change in renal function. Minor changes in slope, perhaps indicating response to therapeutic intervention, can be difficult to identify and yet be of clinical importance. We describe the application of two-phase linear regression to identify and characterise changes in slope using a microcomputer. The method fits two intersecting lines to the data by computing a least-squares estimate of the position of the slope change and its 95% confidence limits. This avoids the potential bias of fixing the change at a preconceived time corresponding with an alteration in treatment. The program then evaluates the statistical and clinical significance of the slope change and produces a graphical output to aid interpretation.
White, Sonia L J; Szűcs, Dénes
2012-01-04
The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice.
2012-01-01
Background The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Methods Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Results Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. Conclusion In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice. PMID:22217191
NASA Astrophysics Data System (ADS)
Chu, Hone-Jay; Kong, Shish-Jeng; Chang, Chih-Hua
2018-03-01
The turbidity (TB) of a water body varies with time and space. Water quality is traditionally estimated via linear regression based on satellite images. However, estimating and mapping water quality require a spatio-temporal nonstationary model, while TB mapping necessitates the use of geographically and temporally weighted regression (GTWR) and geographically weighted regression (GWR) models, both of which are more precise than linear regression. Given the temporal nonstationary models for mapping water quality, GTWR offers the best option for estimating regional water quality. Compared with GWR, GTWR provides highly reliable information for water quality mapping, boasts a relatively high goodness of fit, improves the explanation of variance from 44% to 87%, and shows a sufficient space-time explanatory power. The seasonal patterns of TB and the main spatial patterns of TB variability can be identified using the estimated TB maps from GTWR and by conducting an empirical orthogonal function (EOF) analysis.
Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei
2014-01-01
The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.
2012-06-15
Maintenance AFSCs ................................................................................................. 14 2. Variation Inflation Factors...total variability in the data. It is an indication of how much of the 20 variation in the data can be accounted for in the regression model. In... Variation Inflation Factors for each independent variable (predictor) as regressed against all of the other independent variables in the model. The
Senn, Stephen; Graf, Erika; Caputo, Angelika
2007-12-30
Stratifying and matching by the propensity score are increasingly popular approaches to deal with confounding in medical studies investigating effects of a treatment or exposure. A more traditional alternative technique is the direct adjustment for confounding in regression models. This paper discusses fundamental differences between the two approaches, with a focus on linear regression and propensity score stratification, and identifies points to be considered for an adequate comparison. The treatment estimators are examined for unbiasedness and efficiency. This is illustrated in an application to real data and supplemented by an investigation on properties of the estimators for a range of underlying linear models. We demonstrate that in specific circumstances the propensity score estimator is identical to the effect estimated from a full linear model, even if it is built on coarser covariate strata than the linear model. As a consequence the coarsening property of the propensity score-adjustment for a one-dimensional confounder instead of a high-dimensional covariate-may be viewed as a way to implement a pre-specified, richly parametrized linear model. We conclude that the propensity score estimator inherits the potential for overfitting and that care should be taken to restrict covariates to those relevant for outcome. Copyright (c) 2007 John Wiley & Sons, Ltd.
Detecting influential observations in nonlinear regression modeling of groundwater flow
Yager, Richard M.
1998-01-01
Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
A novel strategy for forensic age prediction by DNA methylation and support vector regression model
Xu, Cheng; Qu, Hongzhu; Wang, Guangyu; Xie, Bingbing; Shi, Yi; Yang, Yaran; Zhao, Zhao; Hu, Lan; Fang, Xiangdong; Yan, Jiangwei; Feng, Lei
2015-01-01
High deviations resulting from prediction model, gender and population difference have limited age estimation application of DNA methylation markers. Here we identified 2,957 novel age-associated DNA methylation sites (P < 0.01 and R2 > 0.5) in blood of eight pairs of Chinese Han female monozygotic twins. Among them, nine novel sites (false discovery rate < 0.01), along with three other reported sites, were further validated in 49 unrelated female volunteers with ages of 20–80 years by Sequenom Massarray. A total of 95 CpGs were covered in the PCR products and 11 of them were built the age prediction models. After comparing four different models including, multivariate linear regression, multivariate nonlinear regression, back propagation neural network and support vector regression, SVR was identified as the most robust model with the least mean absolute deviation from real chronological age (2.8 years) and an average accuracy of 4.7 years predicted by only six loci from the 11 loci, as well as an less cross-validated error compared with linear regression model. Our novel strategy provides an accurate measurement that is highly useful in estimating the individual age in forensic practice as well as in tracking the aging process in other related applications. PMID:26635134
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Partitioning sources of variation in vertebrate species richness
Boone, R.B.; Krohn, W.B.
2000-01-01
Aim: To explore biogeographic patterns of terrestrial vertebrates in Maine, USA using techniques that would describe local and spatial correlations with the environment. Location: Maine, USA. Methods: We delineated the ranges within Maine (86,156 km2) of 275 species using literature and expert review. Ranges were combined into species richness maps, and compared to geomorphology, climate, and woody plant distributions. Methods were adapted that compared richness of all vertebrate classes to each environmental correlate, rather than assessing a single explanatory theory. We partitioned variation in species richness into components using tree and multiple linear regression. Methods were used that allowed for useful comparisons between tree and linear regression results. For both methods we partitioned variation into broad-scale (spatially autocorrelated) and fine-scale (spatially uncorrelated) explained and unexplained components. By partitioning variance, and using both tree and linear regression in analyses, we explored the degree of variation in species richness for each vertebrate group that Could be explained by the relative contribution of each environmental variable. Results: In tree regression, climate variation explained richness better (92% of mean deviance explained for all species) than woody plant variation (87%) and geomorphology (86%). Reptiles were highly correlated with environmental variation (93%), followed by mammals, amphibians, and birds (each with 84-82% deviance explained). In multiple linear regression, climate was most closely associated with total vertebrate richness (78%), followed by woody plants (67%) and geomorphology (56%). Again, reptiles were closely correlated with the environment (95%), followed by mammals (73%), amphibians (63%) and birds (57%). Main conclusions: Comparing variation explained using tree and multiple linear regression quantified the importance of nonlinear relationships and local interactions between species richness and environmental variation, identifying the importance of linear relationships between reptiles and the environment, and nonlinear relationships between birds and woody plants, for example. Conservation planners should capture climatic variation in broad-scale designs; temperatures may shift during climate change, but the underlying correlations between the environment and species richness will presumably remain.
do Prado, Mara Rúbia Maciel Cardoso; Oliveira, Fabiana de Cássia Carvalho; Assis, Karine Franklin; Ribeiro, Sarah Aparecida Vieira; do Prado, Pedro Paulo; Sant'Ana, Luciana Ferreira da Rocha; Priore, Silvia Eloiza; Franceschini, Sylvia do Carmo Castro
2015-01-01
Abstract Objective: To assess the prevalence of vitamin D deficiency and its associated factors in women and their newborns in the postpartum period. Methods: This cross-sectional study evaluated vitamin D deficiency/insufficiency in 226 women and their newborns in Viçosa (Minas Gerais, BR) between December 2011 and November 2012. Cord blood and venous maternal blood were collected to evaluate the following biochemical parameters: vitamin D, alkaline phosphatase, calcium, phosphorus and parathyroid hormone. Poisson regression analysis, with a confidence interval of 95%, was applied to assess vitamin D deficiency and its associated factors. Multiple linear regression analysis was performed to identify factors associated with 25(OH)D deficiency in the newborns and women from the study. The criteria for variable inclusion in the multiple linear regression model was the association with the dependent variable in the simple linear regression analysis, considering p<0.20. Significance level was α <5%. Results: From 226 women included, 200 (88.5%) were 20-44 years old; the median age was 28 years. Deficient/insufficient levels of vitamin D were found in 192 (85%) women and in 182 (80.5%) neonates. The maternal 25(OH)D and alkaline phosphatase levels were independently associated with vitamin D deficiency in infants. Conclusions: This study identified a high prevalence of vitamin D deficiency and insufficiency in women and newborns and the association between maternal nutritional status of vitamin D and their infants' vitamin D status. PMID:26100593
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
Kumar, K Vasanth; Porkodi, K; Rocha, F
2008-01-15
A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of basic red 9 sorption by activated carbon. The r(2) was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions namely coefficient of determination (r(2)), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), the average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. Non-linear regression was found to be a better way to obtain the parameters involved in the isotherms and also the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r(2) was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K(2) was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm.
NASA Astrophysics Data System (ADS)
Haris, A.; Nafian, M.; Riyanto, A.
2017-07-01
Danish North Sea Fields consist of several formations (Ekofisk, Tor, and Cromer Knoll) that was started from the age of Paleocene to Miocene. In this study, the integration of seismic and well log data set is carried out to determine the chalk sand distribution in the Danish North Sea field. The integration of seismic and well log data set is performed by using the seismic inversion analysis and seismic multi-attribute. The seismic inversion algorithm, which is used to derive acoustic impedance (AI), is model-based technique. The derived AI is then used as external attributes for the input of multi-attribute analysis. Moreover, the multi-attribute analysis is used to generate the linear and non-linear transformation of among well log properties. In the case of the linear model, selected transformation is conducted by weighting step-wise linear regression (SWR), while for the non-linear model is performed by using probabilistic neural networks (PNN). The estimated porosity, which is resulted by PNN shows better suited to the well log data compared with the results of SWR. This result can be understood since PNN perform non-linear regression so that the relationship between the attribute data and predicted log data can be optimized. The distribution of chalk sand has been successfully identified and characterized by porosity value ranging from 23% up to 30%.
Abu Bakar, S N; Aspalilah, A; AbdelNasser, I; Nurliza, A; Hairuliza, M J; Swarhib, M; Das, S; Mohd Nor, F
2017-01-01
Stature is one of the characteristics that could be used to identify human, besides age, sex and racial affiliation. This is useful when the body found is either dismembered, mutilated or even decomposed, and helps in narrowing down the missing person's identity. The main aim of the present study was to construct regression functions for stature estimation by using lower limb bones in the Malaysian population. The sample comprised 87 adult individuals (81 males, 6 females) aged between 20 to 79 years. The parameters such as thigh length, lower leg length, leg length, foot length, foot height and foot breadth were measured. They were measured by a ruler and measuring tape. Statistical analysis involved independent t-test to analyse the difference between lower limbs in male and female. The Pearson's correlation test was used to analyse correlations between lower limb parameters and stature, and the linear regressions were used to form equations. The paired t-test was used to compare between actual stature and estimated stature by using the equations formed. Using independent t-test, there was a significant difference (p< 0.05) in the measurement between males and females with regard to leg length, thigh length, lower leg length, foot length and foot breadth. The thigh length, leg length and foot length were observed to have strong correlations with stature with p= 0.75, p= 0.81 and p= 0.69, respectively. Linear regressions were formulated for stature estimation. Paired t-test showed no significant difference between actual stature and estimated stature. It is concluded that regression functions can be used to estimate stature to identify skeletal remains in the Malaysia population.
Geographical variation of cerebrovascular disease in New York State: the correlation with income
Han, Daikwon; Carrow, Shannon S; Rogerson, Peter A; Munschauer, Frederick E
2005-01-01
Background Income is known to be associated with cerebrovascular disease; however, little is known about the more detailed relationship between cerebrovascular disease and income. We examined the hypothesis that the geographical distribution of cerebrovascular disease in New York State may be predicted by a nonlinear model using income as a surrogate socioeconomic risk factor. Results We used spatial clustering methods to identify areas with high and low prevalence of cerebrovascular disease at the ZIP code level after smoothing rates and correcting for edge effects; geographic locations of high and low clusters of cerebrovascular disease in New York State were identified with and without income adjustment. To examine effects of income, we calculated the excess number of cases using a non-linear regression with cerebrovascular disease rates taken as the dependent variable and income and income squared taken as independent variables. The resulting regression equation was: excess rate = 32.075 - 1.22*10-4(income) + 8.068*10-10(income2), and both income and income squared variables were significant at the 0.01 level. When income was included as a covariate in the non-linear regression, the number and size of clusters of high cerebrovascular disease prevalence decreased. Some 87 ZIP codes exceeded the critical value of the local statistic yielding a relative risk of 1.2. The majority of low cerebrovascular disease prevalence geographic clusters disappeared when the non-linear income effect was included. For linear regression, the excess rate of cerebrovascular disease falls with income; each $10,000 increase in median income of each ZIP code resulted in an average reduction of 3.83 observed cases. The significant nonlinear effect indicates a lessening of this income effect with increasing income. Conclusion Income is a non-linear predictor of excess cerebrovascular disease rates, with both low and high observed cerebrovascular disease rate areas associated with higher income. Income alone explains a significant amount of the geographical variance in cerebrovascular disease across New York State since both high and low clusters of cerebrovascular disease dissipate or disappear with income adjustment. Geographical modeling, including non-linear effects of income, may allow for better identification of other non-traditional risk factors. PMID:16242043
Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa
2008-01-01
This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
Linear Modeling and Evaluation of Controls on Flow Response in Western Post-Fire Watersheds
NASA Astrophysics Data System (ADS)
Saxe, S.; Hogue, T. S.; Hay, L.
2015-12-01
This research investigates the impact of wildfires on watershed flow regimes throughout the western United States, specifically focusing on evaluation of fire events within specified subregions and determination of the impact of climate and geophysical variables in post-fire flow response. Fire events were collected through federal and state-level databases and streamflow data were collected from U.S. Geological Survey stream gages. 263 watersheds were identified with at least 10 years of continuous pre-fire daily streamflow records and 5 years of continuous post-fire daily flow records. For each watershed, percent changes in runoff ratio (RO), annual seven day low-flows (7Q2) and annual seven day high-flows (7Q10) were calculated from pre- to post-fire. Numerous independent variables were identified for each watershed and fire event, including topographic, land cover, climate, burn severity, and soils data. The national watersheds were divided into five regions through K-clustering and a lasso linear regression model, applying the Leave-One-Out calibration method, was calculated for each region. Nash-Sutcliffe Efficiency (NSE) was used to determine the accuracy of the resulting models. The regions encompassing the United States along and west of the Rocky Mountains, excluding the coastal watersheds, produced the most accurate linear models. The Pacific coast region models produced poor and inconsistent results, indicating that the regions need to be further subdivided. Presently, RO and HF response variables appear to be more easily modeled than LF. Results of linear regression modeling showed varying importance of watershed and fire event variables, with conflicting correlation between land cover types and soil types by region. The addition of further independent variables and constriction of current variables based on correlation indicators is ongoing and should allow for more accurate linear regression modeling.
Modeling Success: Using Preenrollment Data to Identify Academically At-Risk Students
ERIC Educational Resources Information Center
Gansemer-Topf, Ann M.; Compton, Jonathan; Wohlgemuth, Darin; Forbes, Greg; Ralston, Ekaterina
2015-01-01
Improving student success and degree completion is one of the core principles of strategic enrollment management. To address this principle, institutional data were used to develop a statistical model to identify academically at-risk students. The model employs multiple linear regression techniques to predict students at risk of earning below a…
ERIC Educational Resources Information Center
Roulette-McIntyre, Ovella; Bagaka's, Joshua G.; Drake, Daniel D.
2005-01-01
This study identified parental practices that relate positively to high school students' academic performance. Parents of 643 high school students participated in the study. Data analysis, using a multiple linear regression model, shows parent-school connection, student gender, and race are significant predictors of student academic performance.…
A model of the human in a cognitive prediction task.
NASA Technical Reports Server (NTRS)
Rouse, W. B.
1973-01-01
The human decision maker's behavior when predicting future states of discrete linear dynamic systems driven by zero-mean Gaussian processes is modeled. The task is on a slow enough time scale that physiological constraints are insignificant compared with cognitive limitations. The model is basically a linear regression system identifier with a limited memory and noisy observations. Experimental data are presented and compared to the model.
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
do Prado, Mara Rúbia Maciel Cardoso; Oliveira, Fabiana de Cássia Carvalho; Assis, Karine Franklin; Ribeiro, Sarah Aparecida Vieira; do Prado Junior, Pedro Paulo; Sant'Ana, Luciana Ferreira da Rocha; Priore, Silvia Eloiza; Franceschini, Sylvia do Carmo Castro
2015-01-01
To assess the prevalence of vitamin D deficiency and its associated factors in women and their newborns in the postpartum period. This cross-sectional study evaluated vitamin D deficiency/insufficiency in 226 women and their newborns in Viçosa (Minas Gerais, BR) between December 2011 and November 2012. Cord blood and venous maternal blood were collected to evaluate the following biochemical parameters: vitamin D, alkaline phosphatase, calcium, phosphorus and parathyroid hormone. Poisson regression analysis, with a confidence interval of 95% was applied to assess vitamin D deficiency and its associated factors. Multiple linear regression analysis was performed to identify factors associated with 25(OH)D deficiency in the newborns and women from the study. The criteria for variable inclusion in the multiple linear regression model was the association with the dependent variable in the simple linear regression analysis, considering p<0.20. Significance level was α<5%. From 226 women included, 200 (88.5%) were 20 to 44 years old; the median age was 28 years. Deficient/insufficient levels of vitamin D were found in 192 (85%) women and in 182 (80.5%) neonates. The maternal 25(OH)D and alkaline phosphatase levels were independently associated with vitamin D deficiency in infants. This study identified a high prevalence of vitamin D deficiency and insufficiency in women and newborns and the association between maternal nutritional status of vitamin D and their infants' vitamin D status. Copyright © 2015 Sociedade de Pediatria de São Paulo. Publicado por Elsevier Editora Ltda. All rights reserved.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M
In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.
Chicken barn climate and hazardous volatile compounds control using simple linear regression and PID
NASA Astrophysics Data System (ADS)
Abdullah, A. H.; Bakar, M. A. A.; Shukor, S. A. A.; Saad, F. S. A.; Kamis, M. S.; Mustafa, M. H.; Khalid, N. S.
2016-07-01
The hazardous volatile compounds from chicken manure in chicken barn are potentially to be a health threat to the farm animals and workers. Ammonia (NH3) and hydrogen sulphide (H2S) produced in chicken barn are influenced by climate changes. The Electronic Nose (e-nose) is used for the barn's air, temperature and humidity data sampling. Simple Linear Regression is used to identify the correlation between temperature-humidity, humidity-ammonia and ammonia-hydrogen sulphide. MATLAB Simulink software was used for the sample data analysis using PID controller. Results shows that the performance of PID controller using the Ziegler-Nichols technique can improve the system controller to control climate in chicken barn.
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Shimizu, Yu; Yoshimoto, Junichiro; Takamura, Masahiro; Okada, Go; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS) regression to resting-state functional magnetic resonance imaging (rs-fMRI) data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area. PMID:28700672
NASA Technical Reports Server (NTRS)
Wilson, Edward (Inventor)
2006-01-01
The present invention is a method for identifying unknown parameters in a system having a set of governing equations describing its behavior that cannot be put into regression form with the unknown parameters linearly represented. In this method, the vector of unknown parameters is segmented into a plurality of groups where each individual group of unknown parameters may be isolated linearly by manipulation of said equations. Multiple concurrent and independent recursive least squares identification of each said group run, treating other unknown parameters appearing in their regression equation as if they were known perfectly, with said values provided by recursive least squares estimation from the other groups, thereby enabling the use of fast, compact, efficient linear algorithms to solve problems that would otherwise require nonlinear solution approaches. This invention is presented with application to identification of mass and thruster properties for a thruster-controlled spacecraft.
Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Age and mortality after injury: is the association linear?
Friese, R S; Wynne, J; Joseph, B; Hashmi, A; Diven, C; Pandit, V; O'Keeffe, T; Zangbar, B; Kulvatunyou, N; Rhee, P
2014-10-01
Multiple studies have demonstrated a linear association between advancing age and mortality after injury. An inflection point, or an age at which outcomes begin to differ, has not been previously described. We hypothesized that the relationship between age and mortality after injury is non-linear and an inflection point exists. We performed a retrospective cohort analysis at our urban level I center from 2007 through 2009. All patients aged 65 years and older with the admission diagnosis of injury were included. Non-parametric logistic regression was used to identify the functional form between mortality and age. Multivariate logistic regression was utilized to explore the association between age and mortality. Age 65 years was used as the reference. Significance was defined as p < 0.05. A total of 1,107 patients were included in the analysis. One-third required intensive care unit (ICU) admission and 48 % had traumatic brain injury. 229 patients (20.6 %) were 84 years of age or older. The overall mortality was 7.2 %. Our model indicates that mortality is a quadratic function of age. After controlling for confounders, age is associated with mortality with a regression coefficient of 1.08 for the linear term (p = 0.02) and a regression coefficient of -0.006 for the quadratic term (p = 0.03). The model identified 84.4 years of age as the inflection point at which mortality rates begin to decline. The risk of death after injury varies linearly with age until 84 years. After 84 years of age, the mortality rates decline. These findings may reflect the varying severity of comorbidities and differences in baseline functional status in elderly trauma patients. Specifically, a proportion of our injured patient population less than 84 years old may be more frail, contributing to increased mortality after trauma, whereas a larger proportion of our injured patients over 84 years old, by virtue of reaching this advanced age, may, in fact, be less frail, contributing to less risk of death.
ERIC Educational Resources Information Center
Stratton, Beverly D.; And Others
Demographic data on 92 subjects identified as having reading problems were used to develop equations useful in identifying high risk, reading disabled students. Multiple linear regression analysis of the data indicated that reading disability (1) had a significant positive relationship with birth order and number of siblings; (2) had a positive…
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
DEVELOPMENT OF THE VIRTUAL BEACH MODEL, PHASE 1: AN EMPIRICAL MODEL
With increasing attention focused on the use of multiple linear regression (MLR) modeling of beach fecal bacteria concentration, the validity of the entire statistical process should be carefully evaluated to assure satisfactory predictions. This work aims to identify pitfalls an...
Linear regression crash prediction models : issues and proposed solutions.
DOT National Transportation Integrated Search
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
ERIC Educational Resources Information Center
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.
2008-01-01
Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.
High school science enrollment of black students
NASA Astrophysics Data System (ADS)
Goggins, Ellen O.; Lindbeck, Joy S.
How can the high school science enrollment of black students be increased? School and home counseling and classroom procedures could benefit from variables identified as predictors of science enrollment. The problem in this study was to identify a set of variables which characterize science course enrollment by black secondary students. The population consisted of a subsample of 3963 black high school seniors from The High School and Beyond 1980 Base-Year Survey. Using multiple linear regression, backward regression, and correlation analyses, the US Census regions and grades mostly As and Bs in English were found to be significant predictors of the number of science courses scheduled by black seniors.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Suzuki, Taku; Iwamoto, Takuji; Shizu, Kanae; Suzuki, Katsuji; Yamada, Harumoto; Sato, Kazuki
2017-05-01
This retrospective study was designed to investigate prognostic factors for postoperative outcomes for cubital tunnel syndrome (CubTS) using multiple logistic regression analysis with a large number of patients. Eighty-three patients with CubTS who underwent surgeries were enrolled. The following potential prognostic factors for disease severity were selected according to previous reports: sex, age, type of surgery, disease duration, body mass index, cervical lesion, presence of diabetes mellitus, Workers' Compensation status, preoperative severity, and preoperative electrodiagnostic testing. Postoperative severity of disease was assessed 2 years after surgery by Messina's criteria which is an outcome measure specifically for CubTS. Bivariate analysis was performed to select candidate prognostic factors for multiple linear regression analyses. Multiple logistic regression analysis was conducted to identify the association between postoperative severity and selected prognostic factors. Both bivariate and multiple linear regression analysis revealed only preoperative severity as an independent risk factor for poor prognosis, while other factors did not show any significant association. Although conflicting results exist regarding prognosis of CubTS, this study supports evidence from previous studies and concludes early surgical intervention portends the most favorable prognosis. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Ulibarri, Monica D; Hiller, Sarah P; Lozada, Remedios; Rangel, M Gudelia; Stockman, Jamila K; Silverman, Jay G; Ojeda, Victoria D
2013-01-01
This mixed methods study examined the prevalence and characteristics of physical and sexual abuse and depression symptoms among 624 injection drug-using female sex workers (FSW-IDUs) in Tijuana and Ciudad Juarez, Mexico; a subset of 47 from Tijuana also underwent qualitative interviews. Linear regressions identified correlates of current depression symptoms. In the interviews, FSW-IDUs identified drug use as a method of coping with the trauma they experienced from abuse that occurred before and after age 18 and during the course of sex work. In a multivariate linear regression model, two factors-ever experiencing forced sex and forced sex in the context of sex work-were significantly associated with higher levels of depression symptoms. Our findings suggest the need for integrated mental health and drug abuse services for FSW-IDUs addressing history of trauma as well as for further research on violence revictimization in the context of sex work in Mexico.
Ulibarri, Monica D.; Hiller, Sarah P.; Lozada, Remedios; Rangel, M. Gudelia; Stockman, Jamila K.; Silverman, Jay G.; Ojeda, Victoria D.
2013-01-01
This mixed methods study examined the prevalence and characteristics of physical and sexual abuse and depression symptoms among 624 injection drug-using female sex workers (FSW-IDUs) in Tijuana and Ciudad Juarez, Mexico; a subset of 47 from Tijuana also underwent qualitative interviews. Linear regressions identified correlates of current depression symptoms. In the interviews, FSW-IDUs identified drug use as a method of coping with the trauma they experienced from abuse that occurred before and after age 18 and during the course of sex work. In a multivariate linear regression model, two factors—ever experiencing forced sex and forced sex in the context of sex work—were significantly associated with higher levels of depression symptoms. Our findings suggest the need for integrated mental health and drug abuse services for FSW-IDUs addressing history of trauma as well as for further research on violence revictimization in the context of sex work in Mexico. PMID:23737808
NASA Astrophysics Data System (ADS)
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
Nicholas A. Povak; Paul F. Hessburg; Todd C. McDonnell; Keith M. Reynolds; Timothy J. Sullivan; R. Brion Salter; Bernard J. Crosby
2014-01-01
Accurate estimates of soil mineral weathering are required for regional critical load (CL) modeling to identify ecosystems at risk of the deleterious effects from acidification. Within a correlative modeling framework, we used modeled catchment-level base cation weathering (BCw) as the response variable to identify key environmental correlates and predict a continuous...
Modeling and forecasting US presidential election using learning algorithms
NASA Astrophysics Data System (ADS)
Zolghadr, Mohammad; Niaki, Seyed Armin Akhavan; Niaki, S. T. A.
2017-09-01
The primary objective of this research is to obtain an accurate forecasting model for the US presidential election. To identify a reliable model, artificial neural networks (ANN) and support vector regression (SVR) models are compared based on some specified performance measures. Moreover, six independent variables such as GDP, unemployment rate, the president's approval rate, and others are considered in a stepwise regression to identify significant variables. The president's approval rate is identified as the most significant variable, based on which eight other variables are identified and considered in the model development. Preprocessing methods are applied to prepare the data for the learning algorithms. The proposed procedure significantly increases the accuracy of the model by 50%. The learning algorithms (ANN and SVR) proved to be superior to linear regression based on each method's calculated performance measures. The SVR model is identified as the most accurate model among the other models as this model successfully predicted the outcome of the election in the last three elections (2004, 2008, and 2012). The proposed approach significantly increases the accuracy of the forecast.
Positive Parenting Practices Associated with Subsequent Childhood Weight Change
ERIC Educational Resources Information Center
Avula, Rasmi; Gonzalez, Wendy; Shapiro, Cheri J.; Fram, Maryah S.; Beets, Michael W.; Jones, Sonya J.; Blake, Christine E.; Frongillo, Edward A.
2011-01-01
We aimed to identify positive parenting practices that set children on differential weight-trajectories. Parenting practices studied were cognitively stimulating activities, limit-setting, disciplinary practices, and parent warmth. Data from two U.S. national longitudinal data sets and linear and logistic regression were used to examine…
Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F
2018-06-01
This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.
2014-09-01
Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.
Casero-Alonso, V; López-Fidalgo, J; Torsney, B
2017-01-01
Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Zhang, J; Feng, J-Y; Ni, Y-L; Wen, Y-J; Niu, Y; Tamba, C L; Yue, C; Song, Q; Zhang, Y-M
2017-06-01
Multilocus genome-wide association studies (GWAS) have become the state-of-the-art procedure to identify quantitative trait nucleotides (QTNs) associated with complex traits. However, implementation of multilocus model in GWAS is still difficult. In this study, we integrated least angle regression with empirical Bayes to perform multilocus GWAS under polygenic background control. We used an algorithm of model transformation that whitened the covariance matrix of the polygenic matrix K and environmental noise. Markers on one chromosome were included simultaneously in a multilocus model and least angle regression was used to select the most potentially associated single-nucleotide polymorphisms (SNPs), whereas the markers on the other chromosomes were used to calculate kinship matrix as polygenic background control. The selected SNPs in multilocus model were further detected for their association with the trait by empirical Bayes and likelihood ratio test. We herein refer to this method as the pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes). Results from simulation studies showed that pLARmEB was more powerful in QTN detection and more accurate in QTN effect estimation, had less false positive rate and required less computing time than Bayesian hierarchical generalized linear model, efficient mixed model association (EMMA) and least angle regression plus empirical Bayes. pLARmEB, multilocus random-SNP-effect mixed linear model and fast multilocus random-SNP-effect EMMA methods had almost equal power of QTN detection in simulation experiments. However, only pLARmEB identified 48 previously reported genes for 7 flowering time-related traits in Arabidopsis thaliana.
1974-01-01
REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
Modelling and Closed-Loop System Identification of a Quadrotor-Based Aerial Manipulator
NASA Astrophysics Data System (ADS)
Dube, Chioniso; Pedro, Jimoh O.
2018-05-01
This paper presents the modelling and system identification of a quadrotor-based aerial manipulator. The aerial manipulator model is first derived analytically using the Newton-Euler formulation for the quadrotor and Recursive Newton-Euler formulation for the manipulator. The aerial manipulator is then simulated with the quadrotor under Proportional Derivative (PD) control, with the manipulator in motion. The simulation data is then used for system identification of the aerial manipulator. Auto Regressive with eXogenous inputs (ARX) models are obtained from the system identification for linear accelerations \\ddot{X} and \\ddot{Y} and yaw angular acceleration \\ddot{\\psi }. For linear acceleration \\ddot{Z}, and pitch and roll angular accelerations \\ddot{θ } and \\ddot{φ }, Auto Regressive Moving Average with eXogenous inputs (ARMAX) models are identified.
Element enrichment factor calculation using grain-size distribution and functional data regression.
Sierra, C; Ordóñez, C; Saavedra, A; Gallego, J R
2015-01-01
In environmental geochemistry studies it is common practice to normalize element concentrations in order to remove the effect of grain size. Linear regression with respect to a particular grain size or conservative element is a widely used method of normalization. In this paper, the utility of functional linear regression, in which the grain-size curve is the independent variable and the concentration of pollutant the dependent variable, is analyzed and applied to detrital sediment. After implementing functional linear regression and classical linear regression models to normalize and calculate enrichment factors, we concluded that the former regression technique has some advantages over the latter. First, functional linear regression directly considers the grain-size distribution of the samples as the explanatory variable. Second, as the regression coefficients are not constant values but functions depending on the grain size, it is easier to comprehend the relationship between grain size and pollutant concentration. Third, regularization can be introduced into the model in order to establish equilibrium between reliability of the data and smoothness of the solutions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Who Will Win?: Predicting the Presidential Election Using Linear Regression
ERIC Educational Resources Information Center
Lamb, John H.
2007-01-01
This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
Predictors of Adolescent Breakfast Consumption: Longitudinal Findings from Project EAT
ERIC Educational Resources Information Center
Bruening, Meg; Larson, Nicole; Story, Mary; Neumark-Sztainer, Dianne; Hannan, Peter
2011-01-01
Objective: To identify predictors of breakfast consumption among adolescents. Methods: Five-year longitudinal study Project EAT (Eating Among Teens). Baseline surveys were completed in Minneapolis-St. Paul schools and by mail at follow-up by youth (n = 800) transitioning from middle to high school. Linear regression models examined associations…
Authentic Practices as Contexts for Learning to Draw Inferences beyond Correlated Data
ERIC Educational Resources Information Center
Dierdorp, Adri; Bakker, Arthur; Eijkelhof, Harrie; van Maanen, Jan
2011-01-01
To support 11th-grade students' informal inferential reasoning, a teaching and learning strategy was designed based on authentic practices in which professionals use correlation or linear regression. These practices included identifying suitable physical training programmes, dyke monitoring, and the calibration of measurement instruments. The…
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Partial Least Squares Regression Models for the Analysis of Kinase Signaling.
Bourgeois, Danielle L; Kreeger, Pamela K
2017-01-01
Partial least squares regression (PLSR) is a data-driven modeling approach that can be used to analyze multivariate relationships between kinase networks and cellular decisions or patient outcomes. In PLSR, a linear model relating an X matrix of dependent variables and a Y matrix of independent variables is generated by extracting the factors with the strongest covariation. While the identified relationship is correlative, PLSR models can be used to generate quantitative predictions for new conditions or perturbations to the network, allowing for mechanisms to be identified. This chapter will provide a brief explanation of PLSR and provide an instructive example to demonstrate the use of PLSR to analyze kinase signaling.
Agarwal, Parul; Sambamoorthi, Usha
2015-12-01
Depression is common among individuals with osteoarthritis and leads to increased healthcare burden. The objective of this study was to examine excess total healthcare expenditures associated with depression among individuals with osteoarthritis in the US. Adults with self-reported osteoarthritis (n = 1881) were identified using data from the 2010 Medical Expenditure Panel Survey (MEPS). Among those with osteoarthritis, chi-square tests and ordinary least square regressions (OLS) were used to examine differences in healthcare expenditures between those with and without depression. Post-regression linear decomposition technique was used to estimate the relative contribution of different constructs of the Anderson's behavioral model, i.e., predisposing, enabling, need, personal healthcare practices, and external environment factors, to the excess expenditures associated with depression among individuals with osteoarthritis. All analysis accounted for the complex survey design of MEPS. Depression coexisted among 20.6 % of adults with osteoarthritis. The average total healthcare expenditures were $13,684 among adults with depression compared to $9284 among those without depression. Multivariable OLS regression revealed that adults with depression had 38.8 % higher healthcare expenditures (p < 0.001) compared to those without depression. Post-regression linear decomposition analysis indicated that 50 % of differences in expenditures among adults with and without depression can be explained by differences in need factors. Among individuals with coexisting osteoarthritis and depression, excess healthcare expenditures associated with depression were mainly due to comorbid anxiety, chronic conditions and poor health status. These expenditures may potentially be reduced by providing timely intervention for need factors or by providing care under a collaborative care model.
Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H
2017-05-10
We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P value
Yang, Ruiqi; Wang, Fei; Zhang, Jialing; Zhu, Chonglei; Fan, Limei
2015-05-19
To establish the reference values of thalamus, caudate nucleus and lenticular nucleus diameters through fetal thalamic transverse section. A total of 265 fetuses at our hospital were randomly selected from November 2012 to August 2014. And the transverse and length diameters of thalamus, caudate nucleus and lenticular nucleus were measured. SPSS 19.0 statistical software was used to calculate the regression curve of fetal diameter changes and gestational weeks of pregnancy. P < 0.05 was considered as having statistical significance. The linear regression equation of fetal thalamic length diameter and gestational week was: Y = 0.051X+0.201, R = 0.876, linear regression equation of thalamic transverse diameter and fetal gestational week was: Y = 0.031X+0.229, R = 0.817, linear regression equation of fetal head of caudate nucleus length diameter and gestational age was: Y = 0.033X+0.101, R = 0.722, linear regression equation of fetal head of caudate nucleus transverse diameter and gestational week was: R = 0.025 - 0.046, R = 0.711, linear regression equation of fetal lentiform nucleus length diameter and gestational week was: Y = 0.046+0.229, R = 0.765, linear regression equation of fetal lentiform nucleus diameter and gestational week was: Y = 0.025 - 0.05, R = 0.772. Ultrasonic measurement of diameter of fetal thalamus caudate nucleus, and lenticular nucleus through thalamic transverse section is simple and convenient. And measurements increase with fetal gestational weeks and there is linear regression relationship between them.
Local Linear Regression for Data with AR Errors.
Li, Runze; Li, Yan
2009-07-01
In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Motulsky, Harvey J; Brown, Ronald E
2006-01-01
Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
Identification of internal properties of fibers and micro-swimmers
NASA Astrophysics Data System (ADS)
Plouraboue, Franck; Thiam, Ibrahima; Delmotte, Blaise; Climent, Eric; PSC Collaboration
2016-11-01
In this presentation we discuss the identifiability of constitutive parameters of passive or active micro-swimmers. We first present a general framework for describing fibers or micro-swimmers using a bead-model description. Using a kinematic constraint formulation to describe fibers, flagellum or cilia, we find explicit linear relationship between elastic constitutive parameters and generalised velocities from computing contact forces. This linear formulation then permits to address explicitly identifiability conditions and solve for parameter identification. We show that both active forcing and passive parameters are both identifiable independently but not simultaneously. We also provide unbiased estimators for elastic parameters as well as active ones in the presence of Langevin-like forcing with Gaussian noise using normal linear regression models and maximum likelihood method. These theoretical results are illustrated in various configurations of relaxed or actuated passives fibers, and active filament of known passive properties, showing the efficiency of the proposed approach for direct parameter identification. The convergence of the proposed estimators is successfully tested numerically.
Krider, Lori A.; Magner, Joseph A.; Perry, Jim; Vondracek, Bruce C.; Ferrington, Leonard C.
2013-01-01
Carbonate-sandstone geology in southeastern Minnesota creates a heterogeneous landscape of springs, seeps, and sinkholes that supply groundwater into streams. Air temperatures are effective predictors of water temperature in surface-water dominated streams. However, no published work investigates the relationship between air and water temperatures in groundwater-fed streams (GWFS) across watersheds. We used simple linear regressions to examine weekly air-water temperature relationships for 40 GWFS in southeastern Minnesota. A 40-stream, composite linear regression model has a slope of 0.38, an intercept of 6.63, and R2 of 0.83. The regression models for GWFS have lower slopes and higher intercepts in comparison to surface-water dominated streams. Regression models for streams with high R2 values offer promise for use as predictive tools for future climate conditions. Climate change is expected to alter the thermal regime of groundwater-fed systems, but will do so at a slower rate than surface-water dominated systems. A regression model of intercept vs. slope can be used to identify streams for which water temperatures are more meteorologically than groundwater controlled, and thus more vulnerable to climate change. Such relationships can be used to guide restoration vs. management strategies to protect trout streams.
Morse Code, Scrabble, and the Alphabet
ERIC Educational Resources Information Center
Richardson, Mary; Gabrosek, John; Reischman, Diann; Curtiss, Phyliss
2004-01-01
In this paper we describe an interactive activity that illustrates simple linear regression. Students collect data and analyze it using simple linear regression techniques taught in an introductory applied statistics course. The activity is extended to illustrate checks for regression assumptions and regression diagnostics taught in an…
Mixed models, linear dependency, and identification in age-period-cohort models.
O'Brien, Robert M
2017-07-20
This paper examines the identification problem in age-period-cohort models that use either linear or categorically coded ages, periods, and cohorts or combinations of these parameterizations. These models are not identified using the traditional fixed effect regression model approach because of a linear dependency between the ages, periods, and cohorts. However, these models can be identified if the researcher introduces a single just identifying constraint on the model coefficients. The problem with such constraints is that the results can differ substantially depending on the constraint chosen. Somewhat surprisingly, age-period-cohort models that specify one or more of ages and/or periods and/or cohorts as random effects are identified. This is the case without introducing an additional constraint. I label this identification as statistical model identification and show how statistical model identification comes about in mixed models and why which effects are treated as fixed and which are treated as random can substantially change the estimates of the age, period, and cohort effects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim
2014-01-01
Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Pattern Recognition Analysis of Age-Related Retinal Ganglion Cell Signatures in the Human Eye
Yoshioka, Nayuta; Zangerl, Barbara; Nivison-Smith, Lisa; Khuu, Sieu K.; Jones, Bryan W.; Pfeiffer, Rebecca L.; Marc, Robert E.; Kalloniatis, Michael
2017-01-01
Purpose To characterize macular ganglion cell layer (GCL) changes with age and provide a framework to assess changes in ocular disease. This study used data clustering to analyze macular GCL patterns from optical coherence tomography (OCT) in a large cohort of subjects without ocular disease. Methods Single eyes of 201 patients evaluated at the Centre for Eye Health (Sydney, Australia) were retrospectively enrolled (age range, 20–85); 8 × 8 grid locations obtained from Spectralis OCT macular scans were analyzed with unsupervised classification into statistically separable classes sharing common GCL thickness and change with age. The resulting classes and gridwise data were fitted with linear and segmented linear regression curves. Additionally, normalized data were analyzed to determine regression as a percentage. Accuracy of each model was examined through comparison of predicted 50-year-old equivalent macular GCL thickness for the entire cohort to a true 50-year-old reference cohort. Results Pattern recognition clustered GCL thickness across the macula into five to eight spatially concentric classes. F-test demonstrated segmented linear regression to be the most appropriate model for macular GCL change. The pattern recognition–derived and normalized model revealed less difference between the predicted macular GCL thickness and the reference cohort (average ± SD 0.19 ± 0.92 and −0.30 ± 0.61 μm) than a gridwise model (average ± SD 0.62 ± 1.43 μm). Conclusions Pattern recognition successfully identified statistically separable macular areas that undergo a segmented linear reduction with age. This regression model better predicted macular GCL thickness. The various unique spatial patterns revealed by pattern recognition combined with core GCL thickness data provide a framework to analyze GCL loss in ocular disease. PMID:28632847
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
NASA Astrophysics Data System (ADS)
Kang, Pilsang; Koo, Changhoi; Roh, Hokyu
2017-11-01
Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.
Ferrari, Alberto; Comelli, Mario
2016-12-01
In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.
Tsai, Hsin-Jung; Kuo, Terry B J; Lin, Yu-Cheng; Yang, Cheryl C H
2015-12-30
A blunting of heart rate (HR) reduction during sleep has been reported to be associated with increased all-cause mortality. An increased incident of cardiovascular events has been observed in patients with insomnia but the relationship between nighttime HR and insomnia remains unclear. Here we investigated the HR patterns during the sleep onset period and its association with the length of sleep onset latency (SOL). Nineteen sleep-onset insomniacs (SOI) and 14 good sleepers had their sleep analyzed. Linear regression and nonlinear Hilbert-Huang transform (HHT) of the HR slope were performed in order to analyze HR dynamics during the sleep onset period. A significant depression in HR fluctuation was identified among the SOI group during the sleep onset period when linear regression and HHT analysis were applied. The magnitude of the HR reduction was associated with both polysomnography-defined and subjective SOL; moreover, we found that the linear regression and HHT slopes of the HR showed great sensitivity with respect to sleep quality. Our findings indicate that HR dynamics during the sleep onset period are sensitive to sleep initiation difficulty and respond to the SOL, which indicates that the presence of autonomic dysfunction would seem to affect the progress of falling asleep. Copyright © 2015. Published by Elsevier Ireland Ltd.
Predictors of Grades for Black Americans in a Non-Calculus, Preprofessional Physics Sequence.
ERIC Educational Resources Information Center
Vincent, Harold A.; And Others
Variables to predict grades in a noncalculus, preprofessional college physics course at Xavier University of Louisiana, a historically-black institution, were identified using linear regression. The two-semester, noncalculus physics course emphasizes the application of physics in the health professions. The study population consisted of 123…
Impact of Preadmission Variables on USMLE Step 1 and Step 2 Performance
ERIC Educational Resources Information Center
Kleshinski, James; Khuder, Sadik A.; Shapiro, Joseph I.; Gold, Jeffrey P.
2009-01-01
Purpose: To examine the predictive ability of preadmission variables on United States Medical Licensing Examinations (USMLE) step 1 and step 2 performance, incorporating the use of a neural network model. Method: Preadmission data were collected on matriculants from 1998 to 2004. Linear regression analysis was first used to identify predictors of…
Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach
ERIC Educational Resources Information Center
Mesic, Vanes; Muratovic, Hasnija
2011-01-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…
EMI-Sensor Data to Identify Areas of Manure Accumulation on a Feedlot Surface
USDA-ARS?s Scientific Manuscript database
A study was initiated to test the validity of using electromagnetic induction (EMI) survey data, a prediction-based sampling strategy and ordinary linear regression modeling to predict spatially variable feedlot surface manure accumulation. A 30 m × 60 m feedlot pen with a central mound was selecte...
What Is the Relationship between Teacher Quality and Student Achievement? An Exploratory Study
ERIC Educational Resources Information Center
Stronge, James H.; Ward, Thomas J.; Tucker, Pamela D.; Hindman, Jennifer L.
2007-01-01
The major purpose of the study was to examine what constitutes effective teaching as defined by measured increases in student learning with a focus on the instructional behaviors and practices. Ordinary least squares (OLS) regression analyses and hierarchical linear modeling (HLM) were used to identify teacher effectiveness levels while…
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Quality of life in breast cancer patients--a quantile regression analysis.
Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma
2008-01-01
Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Balabin, Roman M; Smirnov, Sergey V
2011-04-29
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques application, such as Raman, ultraviolet-visible (UV-vis), or nuclear magnetic resonance (NMR) spectroscopies, can be greatly improved by an appropriate feature selection choice. Copyright © 2011 Elsevier B.V. All rights reserved.
Meta-regression analysis of the effect of trans fatty acids on low-density lipoprotein cholesterol.
Allen, Bruce C; Vincent, Melissa J; Liska, DeAnn; Haber, Lynne T
2016-12-01
We conducted a meta-regression of controlled clinical trial data to investigate quantitatively the relationship between dietary intake of industrial trans fatty acids (iTFA) and increased low-density lipoprotein cholesterol (LDL-C). Previous regression analyses included insufficient data to determine the nature of the dose response in the low-dose region and have nonetheless assumed a linear relationship between iTFA intake and LDL-C levels. This work contributes to the previous work by 1) including additional studies examining low-dose intake (identified using an evidence mapping procedure); 2) investigating a range of curve shapes, including both linear and nonlinear models; and 3) using Bayesian meta-regression to combine results across trials. We found that, contrary to previous assumptions, the linear model does not acceptably fit the data, while the nonlinear, S-shaped Hill model fits the data well. Based on a conservative estimate of the degree of intra-individual variability in LDL-C (0.1 mmoL/L), as an estimate of a change in LDL-C that is not adverse, a change in iTFA intake of 2.2% of energy intake (%en) (corresponding to a total iTFA intake of 2.2-2.9%en) does not cause adverse effects on LDL-C. The iTFA intake associated with this change in LDL-C is substantially higher than the average iTFA intake (0.5%en). Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Wang, Yubo; Veluvolu, Kalyana C
2017-06-14
It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
Seeking maximum linearity of transfer functions
NASA Astrophysics Data System (ADS)
Silva, Filipi N.; Comin, Cesar H.; Costa, Luciano da F.
2016-12-01
Linearity is an important and frequently sought property in electronics and instrumentation. Here, we report a method capable of, given a transfer function (theoretical or derived from some real system), identifying the respective most linear region of operation with a fixed width. This methodology, which is based on least squares regression and systematic consideration of all possible regions, has been illustrated with respect to both an analytical (sigmoid transfer function) and a simple situation involving experimental data of a low-power, one-stage class A transistor current amplifier. Such an approach, which has been addressed in terms of transfer functions derived from experimentally obtained characteristic surface, also yielded contributions such as the estimation of local constants of the device, as opposed to typically considered average values. The reported method and results pave the way to several further applications in other types of devices and systems, intelligent control operation, and other areas such as identifying regions of power law behavior.
Simplified large African carnivore density estimators from track indices.
Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J
2016-01-01
The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y = αx + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.
NASA Technical Reports Server (NTRS)
Colwell, R. N. (Principal Investigator)
1983-01-01
The geometric quality of the TM and MSS film products were evaluated by making selective photo measurements such as scale, linear and area determinations; and by measuring the coordinates of known features on both the film products and map products and then relating these paired observations using a standard linear least squares regression approach. Quantitative interpretation tests are described which evaluate the quality and utility of the TM film products and various band combinations for detecting and identifying important forest and agricultural features.
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Hemmila, April; McGill, Jim; Ritter, David
2008-03-01
To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.
Gimelfarb, A.; Willis, J. H.
1994-01-01
An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818
Yokoo, Takeshi; Serai, Suraj D; Pirasteh, Ali; Bashir, Mustafa R; Hamilton, Gavin; Hernando, Diego; Hu, Houchun H; Hetterich, Holger; Kühn, Jens-Peter; Kukuk, Guido M; Loomba, Rohit; Middleton, Michael S; Obuchowski, Nancy A; Song, Ji Soo; Tang, An; Wu, Xinhuai; Reeder, Scott B; Sirlin, Claude B
2018-02-01
Purpose To determine the linearity, bias, and precision of hepatic proton density fat fraction (PDFF) measurements by using magnetic resonance (MR) imaging across different field strengths, imager manufacturers, and reconstruction methods. Materials and Methods This meta-analysis was performed in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A systematic literature search identified studies that evaluated the linearity and/or bias of hepatic PDFF measurements by using MR imaging (hereafter, MR imaging-PDFF) against PDFF measurements by using colocalized MR spectroscopy (hereafter, MR spectroscopy-PDFF) or the precision of MR imaging-PDFF. The quality of each study was evaluated by using the Quality Assessment of Studies of Diagnostic Accuracy 2 tool. De-identified original data sets from the selected studies were pooled. Linearity was evaluated by using linear regression between MR imaging-PDFF and MR spectroscopy-PDFF measurements. Bias, defined as the mean difference between MR imaging-PDFF and MR spectroscopy-PDFF measurements, was evaluated by using Bland-Altman analysis. Precision, defined as the agreement between repeated MR imaging-PDFF measurements, was evaluated by using a linear mixed-effects model, with field strength, imager manufacturer, reconstruction method, and region of interest as random effects. Results Twenty-three studies (1679 participants) were selected for linearity and bias analyses and 11 studies (425 participants) were selected for precision analyses. MR imaging-PDFF was linear with MR spectroscopy-PDFF (R 2 = 0.96). Regression slope (0.97; P < .001) and mean Bland-Altman bias (-0.13%; 95% limits of agreement: -3.95%, 3.40%) indicated minimal underestimation by using MR imaging-PDFF. MR imaging-PDFF was precise at the region-of-interest level, with repeatability and reproducibility coefficients of 2.99% and 4.12%, respectively. Field strength, imager manufacturer, and reconstruction method each had minimal effects on reproducibility. Conclusion MR imaging-PDFF has excellent linearity, bias, and precision across different field strengths, imager manufacturers, and reconstruction methods. © RSNA, 2017 Online supplemental material is available for this article. An earlier incorrect version of this article appeared online. This article was corrected on October 2, 2017.
Grantz, Erin; Haggard, Brian; Scott, J Thad
2018-06-12
We calculated four median datasets (chlorophyll a, Chl a; total phosphorus, TP; and transparency) using multiple approaches to handling censored observations, including substituting fractions of the quantification limit (QL; dataset 1 = 1QL, dataset 2 = 0.5QL) and statistical methods for censored datasets (datasets 3-4) for approximately 100 Texas, USA reservoirs. Trend analyses of differences between dataset 1 and 3 medians indicated percent difference increased linearly above thresholds in percent censored data (%Cen). This relationship was extrapolated to estimate medians for site-parameter combinations with %Cen > 80%, which were combined with dataset 3 as dataset 4. Changepoint analysis of Chl a- and transparency-TP relationships indicated threshold differences up to 50% between datasets. Recursive analysis identified secondary thresholds in dataset 4. Threshold differences show that information introduced via substitution or missing due to limitations of statistical methods biased values, underestimated error, and inflated the strength of TP thresholds identified in datasets 1-3. Analysis of covariance identified differences in linear regression models relating transparency-TP between datasets 1, 2, and the more statistically robust datasets 3-4. Study findings identify high-risk scenarios for biased analytical outcomes when using substitution. These include high probability of median overestimation when %Cen > 50-60% for a single QL, or when %Cen is as low 16% for multiple QL's. Changepoint analysis was uniquely vulnerable to substitution effects when using medians from sites with %Cen > 50%. Linear regression analysis was less sensitive to substitution and missing data effects, but differences in model parameters for transparency cannot be discounted and could be magnified by log-transformation of the variables.
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
An Expert System for the Evaluation of Cost Models
1990-09-01
contrast to the condition of equal error variance, called homoscedasticity. (Reference: Applied Linear Regression Models by John Neter - page 423...normal. (Reference: Applied Linear Regression Models by John Neter - page 125) Click Here to continue -> Autocorrelation Click Here for the index - Index...over time. Error terms correlated over time are said to be autocorrelated or serially correlated. (REFERENCE: Applied Linear Regression Models by John
Wu, F; Callisaya, M; Laslett, L L; Wills, K; Zhou, Y; Jones, G; Winzenberg, T
2016-07-01
This was the first study investigating both linear associations between lower limb muscle strength and balance in middle-aged women and the potential for thresholds for the associations. There was strong evidence that even in middle-aged women, poorer LMS was associated with reduced balance. However, no evidence was found for thresholds. Decline in balance begins in middle age, yet, the role of muscle strength in balance is rarely examined in this age group. We aimed to determine the association between lower limb muscle strength (LMS) and balance in middle-aged women and investigate whether cut-points of LMS exist that might identify women at risk of poorer balance. Cross-sectional analysis of 345 women aged 36-57 years was done. Associations between LMS and balance tests (timed up and go (TUG), step test (ST), functional reach test (FRT), and lateral reach test (LRT)) were assessed using linear regression. Nonlinear associations were explored using locally weighted regression smoothing (LOWESS) and potential cut-points identified using nonlinear least-squares estimation. Segmented regression was used to estimate associations above and below the identified cut-points. Weaker LMS was associated with poorer performance on the TUG (β -0.008 (95 % CI: -0.010, -0.005) second/kg), ST (β 0.031 (0.011, 0.051) step/kg), FRT (β 0.071 (0.047, 0.096) cm/kg), and LRT (β 0.028 (0.011, 0.044) cm/kg), independent of confounders. Potential nonlinear associations were evident from LOWESS results; significant cut-points of LMS were identified for all balance tests (29-50 kg). However, excepting ST, cut-points did not persist after excluding potentially influential data points. In middle-aged women, poorer LMS is associated with reduced balance. Therefore, improving muscle strength in middle-age may be a useful strategy to improve balance and reduce falls risk in later life. Middle-aged women with low muscle strength may be an effective target group for future randomized controlled trials. Australian New Zealand Clinical Trials Registry (ANZCTR) NCT00273260.
Chen, Sung-Wei; Wang, Po-Chuan; Hsin, Ping-Lung; Oates, Anthony; Sun, I-Wen; Liu, Shen-Ing
2011-01-01
Microelectronic engineers are considered valuable human capital contributing significantly toward economic development, but they may encounter stressful work conditions in the context of a globalized industry. The study aims at identifying risk factors of depressive disorders primarily based on job stress models, the Demand-Control-Support and Effort-Reward Imbalance models, and at evaluating whether depressive disorders impair work performance in microelectronics engineers in Taiwan. The case-control study was conducted among 678 microelectronics engineers, 452 controls and 226 cases with depressive disorders which were defined by a score 17 or more on the Beck Depression Inventory and a psychiatrist's diagnosis. The self-administered questionnaires included the Job Content Questionnaire, Effort-Reward Imbalance Questionnaire, demography, psychosocial factors, health behaviors and work performance. Hierarchical logistic regression was applied to identify risk factors of depressive disorders. Multivariate linear regressions were used to determine factors affecting work performance. By hierarchical logistic regression, risk factors of depressive disorders are high demands, low work social support, high effort/reward ratio and low frequency of physical exercise. Combining the two job stress models may have better predictive power for depressive disorders than adopting either model alone. Three multivariate linear regressions provide similar results indicating that depressive disorders are associated with impaired work performance in terms of absence, role limitation and social functioning limitation. The results may provide insight into the applicability of job stress models in a globalized high-tech industry considerably focused in non-Western countries, and the design of workplace preventive strategies for depressive disorders in Asian electronics engineering population.
Specific factors for prenatal lead exposure in the border area of China.
Kawata, Kimiko; Li, Yan; Liu, Hao; Zhang, Xiao Qin; Ushijima, Hiroshi
2006-07-01
The objectives of this study are to examine the prevalence of increased blood lead concentrations in mothers and their umbilical cords, and to identify risk factors for prenatal lead exposure in Kunming city, Yunnan province, China. The study was conducted at two obstetrics departments, and 100 peripartum women were enrolled. The mean blood lead concentrations of the mothers and the umbilical cords were 67.3microg/l and 53.1microg/l, respectively. In multiple linear regression analysis, maternal occupational exposure, maternal consumption of homemade dehydrated vegetables and maternal habitation period in Kunming city were significantly associated with an increase of umbilical cord blood lead concentration. In addition, logistic regression analysis was used to assess the association of umbilical cord blood lead concentrations that possibly have adverse effects on brain development of newborns with each potential risk factor. Maternal frequent use of tableware with color patterns inside was significantly associated with higher cord blood lead concentration in addition to the three items in the multiple linear regression analysis. These points should be considered as specific recommendations for maternal and fetal lead exposure in this city.
The Relationship between Religious Coping and Self-Care Behaviors in Iranian Medical Students.
Sharif Nia, Hamid; Pahlevan Sharif, Saeed; Goudarzian, Amir Hossein; Allen, Kelly A; Jamali, Saman; Heydari Gorji, Mohammad Ali
2017-12-01
In recent years, researchers have identified that coping strategies are an important contributor to an individual's life satisfaction and ability to manage stress. The positive relationship between religious copings, specifically, with physical and mental health has also been identified in some studies. Spirituality and religion have been discussed rigorously in research, but very few studies exist on religious coping. The aim of this study was to determine the relationship between religious coping methods (i.e., positive and negative religious coping) and self-care behaviors in Iranian medical students. This study used a cross-sectional design of 335 randomly selected students from Mazandaran University of Medical Sciences, Iran. A data collection tool comprised of the standard questionnaire of religious coping methods and questionnaire of self-care behaviors assessment was utilized. Data were analyzed using a two-sample t test assuming equal variances. Adjusted linear regression was used to evaluate the independent association of religious copings with self-care. Adjusted linear regression model indicated an independent significant association between positive (b = 4.616, 95% CI 4.234-4.999) and negative (b = -3.726, 95% CI -4.311 to -3.141) religious coping with self-care behaviors. Findings showed a linear relationship between religious coping and self-care behaviors. Further research with larger sample sizes in diverse populations is recommended.
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
Control Variate Selection for Multiresponse Simulation.
1987-05-01
M. H. Knuter, Applied Linear Regression Mfodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F., Probability, Allyn and Bacon...1982. Neter, J., V. Wasserman, and M. H. Knuter, Applied Linear Regression .fodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F...Aspects of J%,ultivariate Statistical Theory, John Wiley and Sons, New York, New York, 1982. dY Neter, J., W. Wasserman, and M. H. Knuter, Applied Linear Regression Mfodels
ERIC Educational Resources Information Center
Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael
2011-01-01
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Nam, R K; Klotz, L H; Jewett, M A; Danjoux, C; Trachtenberg, J
1998-01-01
To study the rate of change in prostate specific antigen (PSA velocity) in patients with prostate cancer initially managed by 'watchful waiting'. Serial PSA levels were determined in 141 patients with prostate cancer confirmed by biopsy, who were initially managed expectantly and enrolled between May 1990 and December 1995. Sixty-seven patients eventually underwent surgery (mean age 59 years) because they chose it (the decision for surgery was not based on PSA velocity). A cohort of 74 patients remained on 'watchful waiting' (mean age 69 years). Linear regression and logarithmic transformations were used to segregate those patients who showed a rapid rise, defined as a > 50% rise in PSA per year (or a doubling time of < 2 years) and designated 'rapid risers'. An initial analysis based on a minimum of two PSA values showed that 31% were rapid risers. Only 15% of patients with more than three serial PSA determinations over > or = 6 months showed a rapid rise in PSA level. There was no advantage of log-linear analysis over linear regression models. Three serial PSA determinations over > or = 6 months in patients with clinically localized prostate cancer identifies a subset (15%) of patients with a rapidly rising PSA level. Shorter PSA surveillance with fewer PSA values may falsely identify patients with rapid rises in PSA level. However, further follow-up is required to determine if a rapid rise in PSA level identifies a subset of patients with an aggressive biological phenotype who are either still curable or who have already progressed to incurability through metastatic disease.
High correlations between MRI brain volume measurements based on NeuroQuant® and FreeSurfer.
Ross, David E; Ochs, Alfred L; Tate, David F; Tokac, Umit; Seabaugh, John; Abildskov, Tracy J; Bigler, Erin D
2018-05-30
NeuroQuant ® (NQ) and FreeSurfer (FS) are commonly used computer-automated programs for measuring MRI brain volume. Previously they were reported to have high intermethod reliabilities but often large intermethod effect size differences. We hypothesized that linear transformations could be used to reduce the large effect sizes. This study was an extension of our previously reported study. We performed NQ and FS brain volume measurements on 60 subjects (including normal controls, patients with traumatic brain injury, and patients with Alzheimer's disease). We used two statistical approaches in parallel to develop methods for transforming FS volumes into NQ volumes: traditional linear regression, and Bayesian linear regression. For both methods, we used regression analyses to develop linear transformations of the FS volumes to make them more similar to the NQ volumes. The FS-to-NQ transformations based on traditional linear regression resulted in effect sizes which were small to moderate. The transformations based on Bayesian linear regression resulted in all effect sizes being trivially small. To our knowledge, this is the first report describing a method for transforming FS to NQ data so as to achieve high reliability and low effect size differences. Machine learning methods like Bayesian regression may be more useful than traditional methods. Copyright © 2018 Elsevier B.V. All rights reserved.
FPGA implementation of predictive degradation model for engine oil lifetime
NASA Astrophysics Data System (ADS)
Idros, M. F. M.; Razak, A. H. A.; Junid, S. A. M. Al; Suliman, S. I.; Halim, A. K.
2018-03-01
This paper presents the implementation of linear regression model for degradation prediction on Register Transfer Logic (RTL) using QuartusII. A stationary model had been identified in the degradation trend for the engine oil in a vehicle in time series method. As for RTL implementation, the degradation model is written in Verilog HDL and the data input are taken at a certain time. Clock divider had been designed to support the timing sequence of input data. At every five data, a regression analysis is adapted for slope variation determination and prediction calculation. Here, only the negative value are taken as the consideration for the prediction purposes for less number of logic gate. Least Square Method is adapted to get the best linear model based on the mean values of time series data. The coded algorithm has been implemented on FPGA for validation purposes. The result shows the prediction time to change the engine oil.
Quantile Regression in the Study of Developmental Sciences
Petscher, Yaacov; Logan, Jessica A. R.
2014-01-01
Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstrated and contrasted with linear regression when considering models with: (a) one continuous predictor, (b) one dichotomous predictor, (c) a continuous and a dichotomous predictor, and (d) a longitudinal application. Results from each example exhibited the differential inferences which may be drawn using linear or quantile regression. PMID:24329596
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-01-01
Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Kumar, K Vasanth; Sivanesan, S
2006-08-25
Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.
Investigating bias in squared regression structure coefficients
Nimon, Kim F.; Zientek, Linda R.; Thompson, Bruce
2015-01-01
The importance of structure coefficients and analogs of regression weights for analysis within the general linear model (GLM) has been well-documented. The purpose of this study was to investigate bias in squared structure coefficients in the context of multiple regression and to determine if a formula that had been shown to correct for bias in squared Pearson correlation coefficients and coefficients of determination could be used to correct for bias in squared regression structure coefficients. Using data from a Monte Carlo simulation, this study found that squared regression structure coefficients corrected with Pratt's formula produced less biased estimates and might be more accurate and stable estimates of population squared regression structure coefficients than estimates with no such corrections. While our findings are in line with prior literature that identified multicollinearity as a predictor of bias in squared regression structure coefficients but not coefficients of determination, the findings from this study are unique in that the level of predictive power, number of predictors, and sample size were also observed to contribute bias in squared regression structure coefficients. PMID:26217273
2015-07-15
Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors
Prediction of the Main Engine Power of a New Container Ship at the Preliminary Design Stage
NASA Astrophysics Data System (ADS)
Cepowski, Tomasz
2017-06-01
The paper presents mathematical relationships that allow us to forecast the estimated main engine power of new container ships, based on data concerning vessels built in 2005-2015. The presented approximations allow us to estimate the engine power based on the length between perpendiculars and the number of containers the ship will carry. The approximations were developed using simple linear regression and multivariate linear regression analysis. The presented relations have practical application for estimation of container ship engine power needed in preliminary parametric design of the ship. It follows from the above that the use of multiple linear regression to predict the main engine power of a container ship brings more accurate solutions than simple linear regression.
Properties of added variable plots in Cox's regression model.
Lindkvist, M
2000-03-01
The added variable plot is useful for examining the effect of a covariate in regression models. The plot provides information regarding the inclusion of a covariate, and is useful in identifying influential observations on the parameter estimates. Hall et al. (1996) proposed a plot for Cox's proportional hazards model derived by regarding the Cox model as a generalized linear model. This paper proves and discusses properties of this plot. These properties make the plot a valuable tool in model evaluation. Quantities considered include parameter estimates, residuals, leverage, case influence measures and correspondence to previously proposed residuals and diagnostics.
ERIC Educational Resources Information Center
Li, Deping; Oranje, Andreas
2007-01-01
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Ernst, Anja F; Albers, Casper J
2017-01-01
Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking.
Ernst, Anja F.
2017-01-01
Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking. PMID:28533971
Hays, Ron D; Revicki, Dennis A; Feeny, David; Fayers, Peter; Spritzer, Karen L; Cella, David
2016-10-01
Preference-based health-related quality of life (HR-QOL) scores are useful as outcome measures in clinical studies, for monitoring the health of populations, and for estimating quality-adjusted life-years. This was a secondary analysis of data collected in an internet survey as part of the Patient-Reported Outcomes Measurement Information System (PROMIS(®)) project. To estimate Health Utilities Index Mark 3 (HUI-3) preference scores, we used the ten PROMIS(®) global health items, the PROMIS-29 V2.0 single pain intensity item and seven multi-item scales (physical functioning, fatigue, pain interference, depressive symptoms, anxiety, ability to participate in social roles and activities, sleep disturbance), and the PROMIS-29 V2.0 items. Linear regression analyses were used to identify significant predictors, followed by simple linear equating to avoid regression to the mean. The regression models explained 48 % (global health items), 61 % (PROMIS-29 V2.0 scales), and 64 % (PROMIS-29 V2.0 items) of the variance in the HUI-3 preference score. Linear equated scores were similar to observed scores, although differences tended to be larger for older study participants. HUI-3 preference scores can be estimated from the PROMIS(®) global health items or PROMIS-29 V2.0. The estimated HUI-3 scores from the PROMIS(®) health measures can be used for economic applications and as a measure of overall HR-QOL in research.
Estimating linear temporal trends from aggregated environmental monitoring data
Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.
2017-01-01
Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.
Comparing The Effectiveness of a90/95 Calculations (Preprint)
2006-09-01
Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-02-01
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.
Kanamori, Shogo; Castro, Marcia C; Sow, Seydou; Matsuno, Rui; Cissokho, Alioune; Jimba, Masamine
2016-01-01
The 5S method is a lean management tool for workplace organization, with 5S being an abbreviation for five Japanese words that translate to English as Sort, Set in Order, Shine, Standardize, and Sustain. In Senegal, the 5S intervention program was implemented in 10 health centers in two regions between 2011 and 2014. To identify the impact of the 5S intervention program on the satisfaction of clients (patients and caretakers) who visited the health centers. A standardized 5S intervention protocol was implemented in the health centers using a quasi-experimental separate pre-post samples design (four intervention and three control health facilities). A questionnaire with 10 five-point Likert items was used to measure client satisfaction. Linear regression analysis was conducted to identify the intervention's effect on the client satisfaction scores, represented by an equally weighted average of the 10 Likert items (Cronbach's alpha=0.83). Additional regression analyses were conducted to identify the intervention's effect on the scores of each Likert item. Backward stepwise linear regression ( n= 1,928) indicated a statistically significant effect of the 5S intervention, represented by an increase of 0.19 points in the client satisfaction scores in the intervention group, 6 to 8 months after the intervention ( p= 0.014). Additional regression analyses showed significant score increases of 0.44 ( p= 0.002), 0.14 ( p= 0.002), 0.06 ( p= 0.019), and 0.17 ( p= 0.044) points on four items, which, respectively were healthcare staff members' communication, explanations about illnesses or cases, and consultation duration, and clients' overall satisfaction. The 5S has the potential to improve client satisfaction at resource-poor health facilities and could therefore be recommended as a strategic option for improving the quality of healthcare service in low- and middle-income countries. To explore more effective intervention modalities, further studies need to address the mechanisms by which 5S leads to attitude changes in healthcare staff.
Brian R Miranda; Brian R Sturtevant; Susan I Stewart; Roger B. Hammer
2012-01-01
Most drivers underlying wildfire are dynamic, but at different spatial and temporal scales. We quantified temporal and spatial trends in wildfire patterns over two spatial extents in northern Wisconsin to identify drivers and their change through time. We used spatial point pattern analysis to quantify the spatial pattern of wildfire occurrences, and linear regression...
ERIC Educational Resources Information Center
Pliszka, Steven R.; Matthews, Thomas L.; Braslow, Kenneth J.; Watson, Melissa A.
2006-01-01
Objective: To determine whether methylphenidate (MPH) and mixed salts amphetamine (MSA) have different effects on growth in children with attention-deficit/hyperactivity disorder. Method: Patients treated for at least 1 year with MPH or MSA were identified. A linear regression was performed to determine the effect of stimulant type, patient…
ERIC Educational Resources Information Center
Hobin, Erin P.; Leatherdale, Scott; Manske, Steve; Dubin, Joel A.; Elliott, Susan; Veugelers, Paul
2013-01-01
Background: This study examined differences in students' time spent in physical activity (PA) across secondary schools in rural, suburban, and urban environments and identified the environment-level factors associated with these between school differences in students' PA. Methods: Multilevel linear regression analyses were used to examine the…
2017-10-01
ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID PROPELLANT GRAIN GEOMETRIES Brian...author(s) and should not be construed as an official Department of the Army position, policy, or decision, unless so designated by other documentation...U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID
Education, Genetic Ancestry, and Blood Pressure in African Americans and Whites
Gravlee, Clarence C.; Mulligan, Connie J.
2012-01-01
Objectives. We assessed the relative roles of education and genetic ancestry in predicting blood pressure (BP) within African Americans and explored the association between education and BP across racial groups. Methods. We used t tests and linear regressions to examine the associations of genetic ancestry, estimated from a genomewide set of autosomal markers, and education with BP variation among African Americans in the Family Blood Pressure Program. We also performed linear regressions in self-identified African Americans and Whites to explore the association of education with BP across racial groups. Results. Education, but not genetic ancestry, significantly predicted BP variation in the African American subsample (b = −0.51 mm Hg per year additional education; P = .001). Although education was inversely associated with BP in the total population, within-group analyses showed that education remained a significant predictor of BP only among the African Americans. We found a significant interaction (b = 3.20; P = .006) between education and self-identified race in predicting BP. Conclusions. Racial disparities in BP may be better explained by differences in education than by genetic ancestry. Future studies of ancestry and disease should include measures of the social environment. PMID:22698014
Education, genetic ancestry, and blood pressure in African Americans and Whites.
Non, Amy L; Gravlee, Clarence C; Mulligan, Connie J
2012-08-01
We assessed the relative roles of education and genetic ancestry in predicting blood pressure (BP) within African Americans and explored the association between education and BP across racial groups. We used t tests and linear regressions to examine the associations of genetic ancestry, estimated from a genomewide set of autosomal markers, and education with BP variation among African Americans in the Family Blood Pressure Program. We also performed linear regressions in self-identified African Americans and Whites to explore the association of education with BP across racial groups. Education, but not genetic ancestry, significantly predicted BP variation in the African American subsample (b=-0.51 mm Hg per year additional education; P=.001). Although education was inversely associated with BP in the total population, within-group analyses showed that education remained a significant predictor of BP only among the African Americans. We found a significant interaction (b=3.20; P=.006) between education and self-identified race in predicting BP. Racial disparities in BP may be better explained by differences in education than by genetic ancestry. Future studies of ancestry and disease should include measures of the social environment.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
A Constrained Linear Estimator for Multiple Regression
ERIC Educational Resources Information Center
Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.
2010-01-01
"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
On the design of classifiers for crop inventories
NASA Technical Reports Server (NTRS)
Heydorn, R. P.; Takacs, H. C.
1986-01-01
Crop proportion estimators that use classifications of satellite data to correct, in an additive way, a given estimate acquired from ground observations are discussed. A linear version of these estimators is optimal, in terms of minimum variance, when the regression of the ground observations onto the satellite observations in linear. When this regression is not linear, but the reverse regression (satellite observations onto ground observations) is linear, the estimator is suboptimal but still has certain appealing variance properties. In this paper expressions are derived for those regressions which relate the intercepts and slopes to conditional classification probabilities. These expressions are then used to discuss the question of classifier designs that can lead to low-variance crop proportion estimates. Variance expressions for these estimates in terms of classifier omission and commission errors are also derived.
Coping Styles in Heart Failure Patients with Depressive Symptoms
Trivedi, Ranak B.; Blumenthal, James A.; O'Connor, Christopher; Adams, Kirkwood; Hinderliter, Alan; Sueta-Dupree, Carla; Johnson, Kristy; Sherwood, Andrew
2009-01-01
Objective Elevated depressive symptoms have been linked to poorer prognosis in heart failure (HF) patients. Our objective was to identify coping styles associated with depressive symptoms in HF patients. Methods 222 stable HF patients (32.75% female, 45.4% non-Hispanic Black) completed multiple questionnaires. Beck Depression Inventory (BDI) assessed depressive symptoms, Life Orientation Test (LOT-R) assessed optimism, ENRICHD Social Support Inventory (ESSI) and Perceived Social Support Scale (PSSS) assessed social support, and COPE assessed coping styles. Linear regression analyses were employed to assess the association of coping styles with continuous BDI scores. Logistic regression analyses were performed using BDI scores dichotomized into BDI<10 versus BDI≥10, to identify coping styles accompanying clinically significant depressive symptoms. Results In linear regression models, higher BDI scores were associated with lower scores on the acceptance (β=-.14), humor (β=-.15), planning (β=-.15), and emotional support (β=-.14) subscales of the COPE, and higher scores on the behavioral disengagement (β=.41), denial (β=.33), venting (β=.25), and mental disengagement (β=.22) subscales. Higher PSSS and ESSI scores were associated with lower BDI scores (β=-.32 and -.25, respectively). Higher LOT-R scores were associated with higher BDI scores (β=.39, p<.001). In logistical regression models, BDI≥10 was associated with greater likelihood of behavioral disengagement (OR=1.3), denial (OR=1.2), mental disengagement (OR=1.3), venting (OR=1.2), and pessimism (OR=1.2), and lower perceived social support measured by PSSS (OR=.92) and ESSI (OR=.92). Conclusion Depressive symptoms in HF patients are associated with avoidant coping, lower perceived social support, and pessimism. Results raise the possibility that interventions designed to improve coping may reduce depressive symptoms. PMID:19773027
Coping styles in heart failure patients with depressive symptoms.
Trivedi, Ranak B; Blumenthal, James A; O'Connor, Christopher; Adams, Kirkwood; Hinderliter, Alan; Dupree, Carla; Johnson, Kristy; Sherwood, Andrew
2009-10-01
Elevated depressive symptoms have been linked to poorer prognosis in heart failure (HF) patients. Our objective was to identify coping styles associated with depressive symptoms in HF patients. A total of 222 stable HF patients (32.75% female, 45.4% non-Hispanic black) completed multiple questionnaires. Beck Depression Inventory (BDI) assessed depressive symptoms, Life Orientation Test (LOT-R) assessed optimism, ENRICHD Social Support Inventory (ESSI) and Perceived Social Support Scale (PSSS) assessed social support, and COPE assessed coping styles. Linear regression analyses were employed to assess the association of coping styles with continuous BDI scores. Logistic regression analyses were performed using BDI scores dichotomized into BDI<10 vs. BDI> or =10, to identify coping styles accompanying clinically significant depressive symptoms. In linear regression models, higher BDI scores were associated with lower scores on the acceptance (beta=-.14), humor (beta=-.15), planning (beta=-.15), and emotional support (beta=-.14) subscales of the COPE, and higher scores on the behavioral disengagement (beta=.41), denial (beta=.33), venting (beta=.25), and mental disengagement (beta=.22) subscales. Higher PSSS and ESSI scores were associated with lower BDI scores (beta=-.32 and -.25, respectively). Higher LOT-R scores were associated with higher BDI scores (beta=.39, P<.001). In logistical regression models, BDI> or =10 was associated with greater likelihood of behavioral disengagement (OR=1.3), denial (OR=1.2), mental disengagement (OR=1.3), venting (OR=1.2), and pessimism (OR=1.2), and lower perceived social support measured by PSSS (OR=.92) and ESSI (OR=.92). Depressive symptoms in HF patients are associated with avoidant coping, lower perceived social support, and pessimism. Results raise the possibility that interventions designed to improve coping may reduce depressive symptoms.
Li, Liang; Wang, Yiying; Xu, Jiting; Flora, Joseph R V; Hoque, Shamia; Berge, Nicole D
2018-08-01
Hydrothermal carbonization (HTC) is a wet, low temperature thermal conversion process that continues to gain attention for the generation of hydrochar. The importance of specific process conditions and feedstock properties on hydrochar characteristics is not well understood. To evaluate this, linear and non-linear models were developed to describe hydrochar characteristics based on data collected from HTC-related literature. A Sobol analysis was subsequently conducted to identify parameters that most influence hydrochar characteristics. Results from this analysis indicate that for each investigated hydrochar property, the model fit and predictive capability associated with the random forest models is superior to both the linear and regression tree models. Based on results from the Sobol analysis, the feedstock properties and process conditions most influential on hydrochar yield, carbon content, and energy content were identified. In addition, a variational process parameter sensitivity analysis was conducted to determine how feedstock property importance changes with process conditions. Copyright © 2018 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wei, J; Chao, M
2016-06-15
Purpose: To develop a novel strategy to extract the respiratory motion of the thoracic diaphragm from kilovoltage cone beam computed tomography (CBCT) projections by a constrained linear regression optimization technique. Methods: A parabolic function was identified as the geometric model and was employed to fit the shape of the diaphragm on the CBCT projections. The search was initialized by five manually placed seeds on a pre-selected projection image. Temporal redundancies, the enabling phenomenology in video compression and encoding techniques, inherent in the dynamic properties of the diaphragm motion together with the geometrical shape of the diaphragm boundary and the associatedmore » algebraic constraint that significantly reduced the searching space of viable parabolic parameters was integrated, which can be effectively optimized by a constrained linear regression approach on the subsequent projections. The innovative algebraic constraints stipulating the kinetic range of the motion and the spatial constraint preventing any unphysical deviations was able to obtain the optimal contour of the diaphragm with minimal initialization. The algorithm was assessed by a fluoroscopic movie acquired at anteriorposterior fixed direction and kilovoltage CBCT projection image sets from four lung and two liver patients. The automatic tracing by the proposed algorithm and manual tracking by a human operator were compared in both space and frequency domains. Results: The error between the estimated and manual detections for the fluoroscopic movie was 0.54mm with standard deviation (SD) of 0.45mm, while the average error for the CBCT projections was 0.79mm with SD of 0.64mm for all enrolled patients. The submillimeter accuracy outcome exhibits the promise of the proposed constrained linear regression approach to track the diaphragm motion on rotational projection images. Conclusion: The new algorithm will provide a potential solution to rendering diaphragm motion and ultimately improving tumor motion management for radiation therapy of cancer patients.« less
Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert
2012-01-01
Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748
Factors influencing the postoperative use of analgesics in dogs and cats by Canadian veterinarians.
Dohoo, S E; Dohoo, I R
1996-09-01
Four hundred and seventeen Canadian veterinarians were surveyed to determine their postoperative use of analgesics in dogs and cats following 6 categories of surgeries, and their opinion toward pain perception and perceived complications associated with the postoperative use of potent opioid analgesics. Three hundred and seventeen (76%) returned the questionnaire. An analgesic user was defined as a veterinarian who administers analgesics to at least 50% of dogs or 50% of cats following abdominal surgery, excluding ovariohysterectomy. The veterinarians responding exhibited a bimodal distribution of analgesic use, with 49.5% being defined as analgesic users. These veterinarians tended to use analgesics in 100% of animals following abdominal surgery. Veterinarians defined as analgesic nonusers rarely used postoperative analgesics following any abdominal surgery. Pain perception was defined as the average of pain rankings (on a scale of 1 to 10) following abdominal surgery, or the value for dogs or cats if the veterinarian worked with only 1 of the 2 species. Maximum concern about the risks associated with the postoperative use of potent opioid agonists was defined as the highest ranking assigned to any of the 7 risks evaluated in either dogs or cats. Logistic regression analysis identified the pain perception score and the maximum concern regarding the use of potent opioid agonists in the postoperative period as the 2 factors that distinguished analgesic users from analgesic nonusers. This model correctly classified 68% of veterinarians as analgesic users or nonusers. Linear regression analysis identified gender and the presence of an animal health technologist in the practice as the 2 factors that influenced pain perception by veterinarians. Linear regression analysis identified working with an animal health technologist, graduation within the past 10 years, and attendance at continuing education as factors that influenced maximum concern about the postoperative use of opioid agonists.
NASA Astrophysics Data System (ADS)
Zhang, Ying; Bi, Peng; Hiller, Janet
2008-01-01
This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
Linear regression analysis of survival data with missing censoring indicators.
Wang, Qihua; Dinse, Gregg E
2011-04-01
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.
An Analysis of COLA (Cost of Living Adjustment) Allocation within the United States Coast Guard.
1983-09-01
books Applied Linear Regression [Ref. 39], and Statistical Methods in Research and Production [Ref. 40], or any other book on regression. In the event...Indexes, Master’s Thesis, Air Force Institute of Technology, Wright-Patterson AFB, 1976. 39. Weisberg, Stanford, Applied Linear Regression , Wiley, 1980. 40
Graphical Description of Johnson-Neyman Outcomes for Linear and Quadratic Regression Surfaces.
ERIC Educational Resources Information Center
Schafer, William D.; Wang, Yuh-Yin
A modification of the usual graphical representation of heterogeneous regressions is described that can aid in interpreting significant regions for linear or quadratic surfaces. The standard Johnson-Neyman graph is a bivariate plot with the criterion variable on the ordinate and the predictor variable on the abscissa. Regression surfaces are drawn…
Teaching the Concept of Breakdown Point in Simple Linear Regression.
ERIC Educational Resources Information Center
Chan, Wai-Sum
2001-01-01
Most introductory textbooks on simple linear regression analysis mention the fact that extreme data points have a great influence on ordinary least-squares regression estimation; however, not many textbooks provide a rigorous mathematical explanation of this phenomenon. Suggests a way to fill this gap by teaching students the concept of breakdown…
Estimating monotonic rates from biological data using local linear regression.
Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R
2017-03-01
Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.
Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.
2006-01-01
As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.
NASA Astrophysics Data System (ADS)
Merkord, C. L.; Liu, Y.; DeVos, M.; Wimberly, M. C.
2015-12-01
Malaria early detection and early warning systems are important tools for public health decision makers in regions where malaria transmission is seasonal and varies from year to year with fluctuations in rainfall and temperature. Here we present a new data-driven dynamic linear model based on the Kalman filter with time-varying coefficients that are used to identify malaria outbreaks as they occur (early detection) and predict the location and timing of future outbreaks (early warning). We fit linear models of malaria incidence with trend and Fourier form seasonal components using three years of weekly malaria case data from 30 districts in the Amhara Region of Ethiopia. We identified past outbreaks by comparing the modeled prediction envelopes with observed case data. Preliminary results demonstrated the potential for improved accuracy and timeliness over commonly-used methods in which thresholds are based on simpler summary statistics of historical data. Other benefits of the dynamic linear modeling approach include robustness to missing data and the ability to fit models with relatively few years of training data. To predict future outbreaks, we started with the early detection model for each district and added a regression component based on satellite-derived environmental predictor variables including precipitation data from the Tropical Rainfall Measuring Mission (TRMM) and land surface temperature (LST) and spectral indices from the Moderate Resolution Imaging Spectroradiometer (MODIS). We included lagged environmental predictors in the regression component of the model, with lags chosen based on cross-correlation of the one-step-ahead forecast errors from the first model. Our results suggest that predictions of future malaria outbreaks can be improved by incorporating lagged environmental predictors.
Locally linear regression for pose-invariant face recognition.
Chai, Xiujuan; Shan, Shiguang; Chen, Xilin; Gao, Wen
2007-07-01
The variation of facial appearance due to the viewpoint (/pose) degrades face recognition systems considerably, which is one of the bottlenecks in face recognition. One of the possible solutions is generating virtual frontal view from any given nonfrontal view to obtain a virtual gallery/probe face. Following this idea, this paper proposes a simple, but efficient, novel locally linear regression (LLR) method, which generates the virtual frontal view from a given nonfrontal face image. We first justify the basic assumption of the paper that there exists an approximate linear mapping between a nonfrontal face image and its frontal counterpart. Then, by formulating the estimation of the linear mapping as a prediction problem, we present the regression-based solution, i.e., globally linear regression. To improve the prediction accuracy in the case of coarse alignment, LLR is further proposed. In LLR, we first perform dense sampling in the nonfrontal face image to obtain many overlapped local patches. Then, the linear regression technique is applied to each small patch for the prediction of its virtual frontal patch. Through the combination of all these patches, the virtual frontal view is generated. The experimental results on the CMU PIE database show distinct advantage of the proposed method over Eigen light-field method.
Linear growth trajectories in Zimbabwean infants12
Gough, Ethan K; Moodie, Erica EM; Prendergast, Andrew J; Ntozini, Robert; Moulton, Lawrence H; Humphrey, Jean H; Manges, Amee R
2016-01-01
Background: Undernutrition in early life underlies 45% of child deaths globally. Stunting malnutrition (suboptimal linear growth) also has long-term negative effects on childhood development. Linear growth deficits accrue in the first 1000 d of life. Understanding the patterns and timing of linear growth faltering or recovery during this period is critical to inform interventions to improve infant nutritional status. Objective: We aimed to identify the pattern and determinants of linear growth trajectories from birth through 24 mo of age in a cohort of Zimbabwean infants. Design: We performed a secondary analysis of longitudinal data from a subset of 3338 HIV-unexposed infants in the Zimbabwe Vitamin A for Mothers and Babies trial. We used k-means clustering for longitudinal data to identify linear growth trajectories and multinomial logistic regression to identify covariates that were associated with each trajectory group. Results: For the entire population, the mean length-for-age z score declined from −0.6 to −1.4 between birth and 24 mo of age. Within the population, 4 growth patterns were identified that were each characterized by worsening linear growth restriction but varied in the timing and severity of growth declines. In our multivariable model, 1-U increments in maternal height and education and infant birth weight and length were associated with greater relative odds of membership in the least–growth restricted groups (A and B) and reduced odds of membership in the more–growth restricted groups (C and D). Male infant sex was associated with reduced odds of membership in groups A and B but with increased odds of membership in groups C and D. Conclusion: In this population, all children were experiencing growth restriction but differences in magnitude were influenced by maternal height and education and infant sex, birth weight, and birth length, which suggest that key determinants of linear growth may already be established by the time of birth. This trial was registered at clinicaltrials.gov as NCT00198718. PMID:27806980
Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello
2018-04-22
A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state-specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation-maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data. Copyright © 2018 John Wiley & Sons, Ltd.
Improving power and robustness for detecting genetic association with extreme-value sampling design.
Chen, Hua Yun; Li, Mingyao
2011-12-01
Extreme-value sampling design that samples subjects with extremely large or small quantitative trait values is commonly used in genetic association studies. Samples in such designs are often treated as "cases" and "controls" and analyzed using logistic regression. Such a case-control analysis ignores the potential dose-response relationship between the quantitative trait and the underlying trait locus and thus may lead to loss of power in detecting genetic association. An alternative approach to analyzing such data is to model the dose-response relationship by a linear regression model. However, parameter estimation from this model can be biased, which may lead to inflated type I errors. We propose a robust and efficient approach that takes into consideration of both the biased sampling design and the potential dose-response relationship. Extensive simulations demonstrate that the proposed method is more powerful than the traditional logistic regression analysis and is more robust than the linear regression analysis. We applied our method to the analysis of a candidate gene association study on high-density lipoprotein cholesterol (HDL-C) which includes study subjects with extremely high or low HDL-C levels. Using our method, we identified several SNPs showing a stronger evidence of association with HDL-C than the traditional case-control logistic regression analysis. Our results suggest that it is important to appropriately model the quantitative traits and to adjust for the biased sampling when dose-response relationship exists in extreme-value sampling designs. © 2011 Wiley Periodicals, Inc.
Lamm, Steven H; Ferdosi, Hamid; Dissen, Elisabeth K; Li, Ji; Ahn, Jaeil
2015-12-07
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1-1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100-150 µg/L arsenic.
Lamm, Steven H.; Ferdosi, Hamid; Dissen, Elisabeth K.; Li, Ji; Ahn, Jaeil
2015-01-01
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1–1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100–150 µg/L arsenic. PMID:26690190
NASA Astrophysics Data System (ADS)
Lucifredi, A.; Mazzieri, C.; Rossi, M.
2000-05-01
Since the operational conditions of a hydroelectric unit can vary within a wide range, the monitoring system must be able to distinguish between the variations of the monitored variable caused by variations of the operation conditions and those due to arising and progressing of failures and misoperations. The paper aims to identify the best technique to be adopted for the monitoring system. Three different methods have been implemented and compared. Two of them use statistical techniques: the first, the linear multiple regression, expresses the monitored variable as a linear function of the process parameters (independent variables), while the second, the dynamic kriging technique, is a modified technique of multiple linear regression representing the monitored variable as a linear combination of the process variables in such a way as to minimize the variance of the estimate error. The third is based on neural networks. Tests have shown that the monitoring system based on the kriging technique is not affected by some problems common to the other two models e.g. the requirement of a large amount of data for their tuning, both for training the neural network and defining the optimum plane for the multiple regression, not only in the system starting phase but also after a trivial operation of maintenance involving the substitution of machinery components having a direct impact on the observed variable. Or, in addition, the necessity of different models to describe in a satisfactory way the different ranges of operation of the plant. The monitoring system based on the kriging statistical technique overrides the previous difficulties: it does not require a large amount of data to be tuned and is immediately operational: given two points, the third can be immediately estimated; in addition the model follows the system without adapting itself to it. The results of the experimentation performed seem to indicate that a model based on a neural network or on a linear multiple regression is not optimal, and that a different approach is necessary to reduce the amount of work during the learning phase using, when available, all the information stored during the initial phase of the plant to build the reference baseline, elaborating, if it is the case, the raw information available. A mixed approach using the kriging statistical technique and neural network techniques could optimise the result.
Levine, Matthew E; Albers, David J; Hripcsak, George
2016-01-01
Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.
Effect of Malmquist bias on correlation studies with IRAS data base
NASA Technical Reports Server (NTRS)
Verter, Frances
1993-01-01
The relationships between galaxy properties in the sample of Trinchieri et al. (1989) are reexamined with corrections for Malmquist bias. The linear correlations are tested and linear regressions are fit for log-log plots of L(FIR), L(H-alpha), and L(B) as well as ratios of these quantities. The linear correlations for Malmquist bias are corrected using the method of Verter (1988), in which each galaxy observation is weighted by the inverse of its sampling volume. The linear regressions are corrected for Malmquist bias by a new method invented here in which each galaxy observation is weighted by its sampling volume. The results of correlation and regressions among the sample are significantly changed in the anticipated sense that the corrected correlation confidences are lower and the corrected slopes of the linear regressions are lower. The elimination of Malmquist bias eliminates the nonlinear rise in luminosity that has caused some authors to hypothesize additional components of FIR emission.
Analysis of regression methods for solar activity forecasting
NASA Technical Reports Server (NTRS)
Lundquist, C. A.; Vaughan, W. W.
1979-01-01
The paper deals with the potential use of the most recent solar data to project trends in the next few years. Assuming that a mode of solar influence on weather can be identified, advantageous use of that knowledge presumably depends on estimating future solar activity. A frequently used technique for solar cycle predictions is a linear regression procedure along the lines formulated by McNish and Lincoln (1949). The paper presents a sensitivity analysis of the behavior of such regression methods relative to the following aspects: cycle minimum, time into cycle, composition of historical data base, and unnormalized vs. normalized solar cycle data. Comparative solar cycle forecasts for several past cycles are presented as to these aspects of the input data. Implications for the current cycle, No. 21, are also given.
1998-01-01
Changes in domestic refining operations are identified and related to the summer Reid vapor pressure (RVP) restrictions and oxygenate blending requirements. This analysis uses published Energy Information Administration survey data and linear regression equations from the Short-Term Integrated Forecasting System (STIFS). The STIFS model is used for producing forecasts appearing in the Short-Term Energy Outlook.
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.
Huang, Li-Shan; Myers, Gary J; Davidson, Philip W; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W; Cernichiari, Elsa; Shamlaye, Conrad F; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W
2007-11-01
Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age 9 years. The analyses for the most recent 9-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated non-linearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of 21 endpoints available at age 9 years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other 9-year endpoints that in the linear analysis had a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels.
Machine learning approaches to the social determinants of health in the health and retirement study.
Seligman, Benjamin; Tuljapurkar, Shripad; Rehkopf, David
2018-04-01
Social and economic factors are important predictors of health and of recognized importance for health systems. However, machine learning, used elsewhere in the biomedical literature, has not been extensively applied to study relationships between society and health. We investigate how machine learning may add to our understanding of social determinants of health using data from the Health and Retirement Study. A linear regression of age and gender, and a parsimonious theory-based regression additionally incorporating income, wealth, and education, were used to predict systolic blood pressure, body mass index, waist circumference, and telomere length. Prediction, fit, and interpretability were compared across four machine learning methods: linear regression, penalized regressions, random forests, and neural networks. All models had poor out-of-sample prediction. Most machine learning models performed similarly to the simpler models. However, neural networks greatly outperformed the three other methods. Neural networks also had good fit to the data ( R 2 between 0.4-0.6, versus <0.3 for all others). Across machine learning models, nine variables were frequently selected or highly weighted as predictors: dental visits, current smoking, self-rated health, serial-seven subtractions, probability of receiving an inheritance, probability of leaving an inheritance of at least $10,000, number of children ever born, African-American race, and gender. Some of the machine learning methods do not improve prediction or fit beyond simpler models, however, neural networks performed well. The predictors identified across models suggest underlying social factors that are important predictors of biological indicators of chronic disease, and that the non-linear and interactive relationships between variables fundamental to the neural network approach may be important to consider.
NASA Technical Reports Server (NTRS)
Maughan, P. M. (Principal Investigator)
1973-01-01
The author has identified the following significant results. Linear regression of secchi disc visibility against number of sets yielded significant results in a number of instances. The variability seen in the slope of the regression lines is due to the nonuniformity of sample size. The longer the period sampled, the larger the total number of attempts. Further, there is no reason to expect either the influence of transparency or of other variables to remain constant throughout the season. However, the fact that the data for the entire season, variable as it is, was significant at the 5% level, suggests its potential utility for predictive modeling. Thus, this regression equation will be considered representative and will be utilized for the first numerical model. Secchi disc visibility was also regressed against number of sets for the three day period September 27-September 29, 1972 to determine if surface truth data supported the intense relationship between ERTS-1 identified turbidity and fishing effort previously discussed. A very negative correlation was found. These relationship lend additional credence to the hypothesis that ERTS imagery, when utilized as a source of visibility (turbidity) data, may be useful as a predictive tool.
ERIC Educational Resources Information Center
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
NASA Astrophysics Data System (ADS)
Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine
2018-01-01
Statistical downscaling models (SDMs) are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
ERIC Educational Resources Information Center
Rocconi, Louis M.
2011-01-01
Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Classical Testing in Functional Linear Models.
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.
Classical Testing in Functional Linear Models
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155
Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi
2013-09-01
Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore, examples are given as how to evaluate the linearity over the entire concentration range when only two concentration levels are used for linear regression. To conclude, two-concentration linear regression is accurate and robust enough for routine use in regulated LC-MS bioanalysis and it significantly saves time and cost as well. Copyright © 2013 Elsevier B.V. All rights reserved.
Redmond, Tony; O'Leary, Neil; Hutchison, Donna M; Nicolela, Marcelo T; Artes, Paul H; Chauhan, Balwantray C
2013-12-01
A new analysis method called permutation of pointwise linear regression measures the significance of deterioration over time at each visual field location, combines the significance values into an overall statistic, and then determines the likelihood of change in the visual field. Because the outcome is a single P value, individualized to that specific visual field and independent of the scale of the original measurement, the method is well suited for comparing techniques with different stimuli and scales. To test the hypothesis that frequency-doubling matrix perimetry (FDT2) is more sensitive than standard automated perimetry (SAP) in identifying visual field progression in glaucoma. Patients with open-angle glaucoma and healthy controls were examined by FDT2 and SAP, both with the 24-2 test pattern, on the same day at 6-month intervals in a longitudinal prospective study conducted in a hospital-based setting. Only participants with at least 5 examinations were included. Data were analyzed with permutation of pointwise linear regression. Permutation of pointwise linear regression is individualized to each participant, in contrast to current analyses in which the statistical significance is inferred from population-based approaches. Analyses were performed with both total deviation and pattern deviation. Sixty-four patients and 36 controls were included in the study. The median age, SAP mean deviation, and follow-up period were 65 years, -2.6 dB, and 5.4 years, respectively, in patients and 62 years, +0.4 dB, and 5.2 years, respectively, in controls. Using total deviation analyses, statistically significant deterioration was identified in 17% of patients with FDT2, in 34% of patients with SAP, and in 14% of patients with both techniques; in controls these percentages were 8% with FDT2, 31% with SAP, and 8% with both. Using pattern deviation analyses, statistically significant deterioration was identified in 16% of patients with FDT2, in 17% of patients with SAP, and in 3% of patients with both techniques; in controls these values were 3% with FDT2 and none with SAP. No evidence was found that FDT2 is more sensitive than SAP in identifying visual field deterioration. In about one-third of healthy controls, age-related deterioration with SAP reached statistical significance.
A Linear Regression and Markov Chain Model for the Arabian Horse Registry
1993-04-01
as a tax deduction? Yes No T-4367 68 26. Regardless of previous equine tax deductions, do you consider your current horse activities to be... (Mark one...E L T-4367 A Linear Regression and Markov Chain Model For the Arabian Horse Registry Accesion For NTIS CRA&I UT 7 4:iC=D 5 D-IC JA" LI J:13tjlC,3 lO...the Arabian Horse Registry, which needed to forecast its future registration of purebred Arabian horses . A linear regression model was utilized to
Rahman, Md. Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D. W.; Labrique, Alain B.; Rashid, Mahbubur; Christian, Parul; West, Keith P.
2017-01-01
Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 − -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset. PMID:29261760
Kabir, Alamgir; Rahman, Md Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D W; Labrique, Alain B; Rashid, Mahbubur; Christian, Parul; West, Keith P
2017-01-01
Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 - -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset.
Caywood, Matthew S.; Roberts, Daniel M.; Colombe, Jeffrey B.; Greenwald, Hal S.; Weiland, Monica Z.
2017-01-01
There is increasing interest in real-time brain-computer interfaces (BCIs) for the passive monitoring of human cognitive state, including cognitive workload. Too often, however, effective BCIs based on machine learning techniques may function as “black boxes” that are difficult to analyze or interpret. In an effort toward more interpretable BCIs, we studied a family of N-back working memory tasks using a machine learning model, Gaussian Process Regression (GPR), which was both powerful and amenable to analysis. Participants performed the N-back task with three stimulus variants, auditory-verbal, visual-spatial, and visual-numeric, each at three working memory loads. GPR models were trained and tested on EEG data from all three task variants combined, in an effort to identify a model that could be predictive of mental workload demand regardless of stimulus modality. To provide a comparison for GPR performance, a model was additionally trained using multiple linear regression (MLR). The GPR model was effective when trained on individual participant EEG data, resulting in an average standardized mean squared error (sMSE) between true and predicted N-back levels of 0.44. In comparison, the MLR model using the same data resulted in an average sMSE of 0.55. We additionally demonstrate how GPR can be used to identify which EEG features are relevant for prediction of cognitive workload in an individual participant. A fraction of EEG features accounted for the majority of the model’s predictive power; using only the top 25% of features performed nearly as well as using 100% of features. Subsets of features identified by linear models (ANOVA) were not as efficient as subsets identified by GPR. This raises the possibility of BCIs that require fewer model features while capturing all of the information needed to achieve high predictive accuracy. PMID:28123359
An improved multiple linear regression and data analysis computer program package
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Sieve estimation of Cox models with latent structures.
Cao, Yongxiu; Huang, Jian; Liu, Yanyan; Zhao, Xingqiu
2016-12-01
This article considers sieve estimation in the Cox model with an unknown regression structure based on right-censored data. We propose a semiparametric pursuit method to simultaneously identify and estimate linear and nonparametric covariate effects based on B-spline expansions through a penalized group selection method with concave penalties. We show that the estimators of the linear effects and the nonparametric component are consistent. Furthermore, we establish the asymptotic normality of the estimator of the linear effects. To compute the proposed estimators, we develop a modified blockwise majorization descent algorithm that is efficient and easy to implement. Simulation studies demonstrate that the proposed method performs well in finite sample situations. We also use the primary biliary cirrhosis data to illustrate its application. © 2016, The International Biometric Society.
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452
Rodriguez-Sabate, Clara; Morales, Ingrid; Sanchez, Alberto; Rodriguez, Manuel
2017-01-01
The complexity of basal ganglia (BG) interactions is often condensed into simple models mainly based on animal data and that present BG in closed-loop cortico-subcortical circuits of excitatory/inhibitory pathways which analyze the incoming cortical data and return the processed information to the cortex. This study was aimed at identifying functional relationships in the BG motor-loop of 24 healthy-subjects who provided written, informed consent and whose BOLD-activity was recorded by MRI methods. The analysis of the functional interaction between these centers by correlation techniques and multiple linear regression showed non-linear relationships which cannot be suitably addressed with these methods. The multiple correspondence analysis (MCA), an unsupervised multivariable procedure which can identify non-linear interactions, was used to study the functional connectivity of BG when subjects were at rest. Linear methods showed different functional interactions expected according to current BG models. MCA showed additional functional interactions which were not evident when using lineal methods. Seven functional configurations of BG were identified with MCA, two involving the primary motor and somatosensory cortex, one involving the deepest BG (external-internal globus pallidum, subthalamic nucleus and substantia nigral), one with the input-output BG centers (putamen and motor thalamus), two linking the input-output centers with other BG (external pallidum and subthalamic nucleus), and one linking the external pallidum and the substantia nigral. The results provide evidence that the non-linear MCA and linear methods are complementary and should be best used in conjunction to more fully understand the nature of functional connectivity of brain centers.
Liu, Weijian; Wang, Yilong; Chen, Yuanchen; Tao, Shu; Liu, Wenxin
2017-07-01
The total concentrations and component profiles of polycyclic aromatic hydrocarbons (PAHs) in ambient air, surface soil and wheat grain collected from wheat fields near a large steel-smelting manufacturer in Northern China were determined. Based on the specific isomeric ratios of paired species in ambient air, principle component analysis and multivariate linear regression, the main emission source of local PAHs was identified as a mixture of industrial and domestic coal combustion, biomass burning and traffic exhaust. The total organic carbon (TOC) fraction was considerably correlated with the total and individual PAH concentrations in surface soil. The total concentrations of PAHs in wheat grain were relatively low, with dominant low molecular weight constituents, and the compositional profile was more similar to that in ambient air than in topsoil. Combined with more significant results from partial correlation and linear regression models, the contribution from air PAHs to grain PAHs may be greater than that from soil PAHs. Copyright © 2016. Published by Elsevier B.V.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-07-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Biostatistics Series Module 6: Correlation and Linear Regression.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Biostatistics Series Module 6: Correlation and Linear Regression
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous. PMID:27904175
Internet sexuality research with rural men who have sex with men: can we recruit and retain them?
Bowen, Anne
2005-11-01
This study examines the utility of internet banner ads for recruiting rural MSM and identifies correlates of internet HIV risk survey initiation and completion. Banner ads were shown on a popular internet dating site for one month and resulted in 1,045 rural MSM, from 49 States, Canada, Australia/New Zealand, and 5 from other countries initiating the questionnaire. Logistic regression indicated that progression beyond screening questions was negatively related to "expecting pay, but not being paid" and positively related to "using chat rooms to find friends" and identifying as gay. Linear regression indicated that the absolute number of responses by consenting participants was positively correlated with reimbursement, number of sexual partners, motivated by money, and having been HIV tested. Overall, this sample represents one of the largest rural MSM samples; survey completion was high and strengthened by reimbursement and possibly by awareness of HIV risk. Generalizability was limited by low participation of minority and non-gay identified MSM.
Cobb, J; Cule, E; Moncrieffe, H; Hinks, A; Ursu, S; Patrick, F; Kassoumeri, L; Flynn, E; Bulatović, M; Wulffraat, N; van Zelst, B; de Jonge, R; Bohm, M; Dolezalova, P; Hirani, S; Newman, S; Whitworth, P; Southwood, T R; De Iorio, M; Wedderburn, L R; Thomson, W
2014-08-01
Clinical response to methotrexate (MTX) treatment for children with juvenile idiopathic arthritis (JIA) displays considerable heterogeneity. Currently, there are no reliable predictors to identify non-responders: earlier identification could lead to a targeted treatment. We genotyped 759 JIA cases from the UK, the Netherlands and Czech Republic. Clinical variables were measured at baseline and 6 months after start of the treatment. In Phase I analysis, samples were analysed for the association with MTX response using ordinal regression of ACR-pedi categories and linear regression of change in clinical variables, and identified 31 genetic regions (P<0.001). Phase II analysis increased SNP density in the most strongly associated regions, identifying 14 regions (P<1 × 10(-5)): three contain genes of particular biological interest (ZMIZ1, TGIF1 and CFTR). These data suggest a role for novel pathways in MTX response and further investigations within associated regions will help to reach our goal of predicting response to MTX in JIA.
Schlairet, Maura C; Schlairet, Timothy James; Sauls, Denise H; Bellflowers, Lois
2015-03-01
Establishing the impact of the high-fidelity simulation environment on student performance, as well as identifying factors that could predict learning, would refine simulation outcome expectations among educators. The purpose of this quasi-experimental pilot study was to explore the impact of simulation on emotion and cognitive load among beginning nursing students. Forty baccalaureate nursing students participated in teaching simulations, rated their emotional state and cognitive load, and completed evaluation simulations. Two principal components of emotion were identified representing the pleasant activation and pleasant deactivation components of affect. Mean rating of cognitive load following simulation was high. Linear regression identiffed slight but statistically nonsignificant positive associations between principal components of emotion and cognitive load. Logistic regression identified a negative but statistically nonsignificant effect of cognitive load on assessment performance. Among lower ability students, a more pronounced effect of cognitive load on assessment performance was observed; this also was statistically non-significant. Copyright 2015, SLACK Incorporated.
Hewson, Caroline J.; Dohoo, Ian R.
2006-01-01
Abstract Factors affecting the postincisional use of analgesics for ovariohysterectomy (OVH) in dogs and cats were assessed by using data collected from 280 Canadian veterinarians, as part of a national, randomized mail survey (response rate 57.8%). Predictors of analgesic usage identified by logistic regression included the presence of at least 1 animal health technician (AHT) per 2 veterinarians (OR = 2.3, P = 0.004), and the veterinarians’ perception of the pain caused by surgery without analgesia (OR = 1.5, P < 0.001). Linear regression identified the following predictors of veterinarians’ perception of pain: the presence of more than 1 AHT per 2 veterinarians (coefficient = 0.42, P = 0.048) and the number of years since graduation (coefficient = −0.073, P < 0.001). Some of these risk factors are similar to those identified in 1994. The results suggest that continuing education may help to increase analgesic usage. Other important contributors may be client education and a valid method of pain assessment. PMID:16734371
ERIC Educational Resources Information Center
Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.
2013-01-01
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.
2009-01-01
In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-11-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Graphical Tools for Linear Structural Equation Modeling
2014-06-01
others. 4Kenny and Milan (2011) write, “Identification is perhaps the most difficult concept for SEM researchers to understand. We have seen SEM...model to using typical SEM software to determine model identifia- bility. Kenny and Milan (2011) list the following drawbacks: (i) If poor starting...the well known recursive and null rules (Bollen, 1989) and the regression rule (Kenny and Milan , 2011). A Simple Criterion for Identifying Individual
Abbaspour, Seddigheh; Farmanbar, Rabiollah; Njafi, Fateme; Ghiasvand, Arezoo Mohamadkhani; Dehghankar, Leila
2017-01-01
Background Regular physical activity has been considered as health promotion, and identifying different effective psycho-social variables on physical has proven to be essential. Objective To identify the relationship between decisional balance and self-efficacy in physical activities using the transtheoretical model in the members of a retirement center in Rasht, Guillen. Methods A descriptive cross-sectional study was conducted in 2013 by using convenient sampling on 262 elderly people who are the members of retirement centers in Rasht. Data were collected using Stages of change, Decisional balance, Self-efficacy and Physical Activity Scale for the Elderly (PASE). Data was analyzed using SPSS-16 software, descriptive and analytic statistic (Pearson correlation, Spearman, ANOVA, HSD Tukey, linear and ordinal regression). Results The majority of participants were in maintenance stage. Mean and standard deviation physical activity for the elderly was 119.35±51.50. Stages of change and physical activities were significantly associated with decisional balance and self-efficacy (p<0.0001); however, cons had a significant and reverse association. According to linear and ordinal regression the only predicator variable of physical activity behavior was self-efficacy. Conclusion By increase in pros and self-efficacy on doing physical activity, it can be benefited in designing appropriate intervention programs. PMID:28713520
NASA Astrophysics Data System (ADS)
Bradshaw, Tyler; Fu, Rau; Bowen, Stephen; Zhu, Jun; Forrest, Lisa; Jeraj, Robert
2015-07-01
Dose painting relies on the ability of functional imaging to identify resistant tumor subvolumes to be targeted for additional boosting. This work assessed the ability of FDG, FLT, and Cu-ATSM PET imaging to predict the locations of residual FDG PET in canine tumors following radiotherapy. Nineteen canines with spontaneous sinonasal tumors underwent PET/CT imaging with radiotracers FDG, FLT, and Cu-ATSM prior to hypofractionated radiotherapy. Therapy consisted of 10 fractions of 4.2 Gy to the sinonasal cavity with or without an integrated boost of 0.8 Gy to the GTV. Patients had an additional FLT PET/CT scan after fraction 2, a Cu-ATSM PET/CT scan after fraction 3, and follow-up FDG PET/CT scans after radiotherapy. Following image registration, simple and multiple linear and logistic voxel regressions were performed to assess how well pre- and mid-treatment PET imaging predicted post-treatment FDG uptake. R2 and pseudo R2 were used to assess the goodness of fits. For simple linear regression models, regression coefficients for all pre- and mid-treatment PET images were significantly positive across the population (P < 0.05). However, there was large variability among patients in goodness of fits: R2 ranged from 0.00 to 0.85, with a median of 0.12. Results for logistic regression models were similar. Multiple linear regression models resulted in better fits (median R2 = 0.31), but there was still large variability between patients in R2. The R2 from regression models for different predictor variables were highly correlated across patients (R ≈ 0.8), indicating tumors that were poorly predicted with one tracer were also poorly predicted by other tracers. In conclusion, the high inter-patient variability in goodness of fits indicates that PET was able to predict locations of residual tumor in some patients, but not others. This suggests not all patients would be good candidates for dose painting based on a single biological target.
Bradshaw, Tyler; Fu, Rau; Bowen, Stephen; Zhu, Jun; Forrest, Lisa; Jeraj, Robert
2015-07-07
Dose painting relies on the ability of functional imaging to identify resistant tumor subvolumes to be targeted for additional boosting. This work assessed the ability of FDG, FLT, and Cu-ATSM PET imaging to predict the locations of residual FDG PET in canine tumors following radiotherapy. Nineteen canines with spontaneous sinonasal tumors underwent PET/CT imaging with radiotracers FDG, FLT, and Cu-ATSM prior to hypofractionated radiotherapy. Therapy consisted of 10 fractions of 4.2 Gy to the sinonasal cavity with or without an integrated boost of 0.8 Gy to the GTV. Patients had an additional FLT PET/CT scan after fraction 2, a Cu-ATSM PET/CT scan after fraction 3, and follow-up FDG PET/CT scans after radiotherapy. Following image registration, simple and multiple linear and logistic voxel regressions were performed to assess how well pre- and mid-treatment PET imaging predicted post-treatment FDG uptake. R(2) and pseudo R(2) were used to assess the goodness of fits. For simple linear regression models, regression coefficients for all pre- and mid-treatment PET images were significantly positive across the population (P < 0.05). However, there was large variability among patients in goodness of fits: R(2) ranged from 0.00 to 0.85, with a median of 0.12. Results for logistic regression models were similar. Multiple linear regression models resulted in better fits (median R(2) = 0.31), but there was still large variability between patients in R(2). The R(2) from regression models for different predictor variables were highly correlated across patients (R ≈ 0.8), indicating tumors that were poorly predicted with one tracer were also poorly predicted by other tracers. In conclusion, the high inter-patient variability in goodness of fits indicates that PET was able to predict locations of residual tumor in some patients, but not others. This suggests not all patients would be good candidates for dose painting based on a single biological target.
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Optimizing data collection for public health decisions: a data mining approach
2014-01-01
Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484
Quality of search strategies reported in systematic reviews published in stereotactic radiosurgery.
Faggion, Clovis M; Wu, Yun-Chun; Tu, Yu-Kang; Wasiak, Jason
2016-06-01
Systematic reviews require comprehensive literature search strategies to avoid publication bias. This study aimed to assess and evaluate the reporting quality of search strategies within systematic reviews published in the field of stereotactic radiosurgery (SRS). Three electronic databases (Ovid MEDLINE(®), Ovid EMBASE(®) and the Cochrane Library) were searched to identify systematic reviews addressing SRS interventions, with the last search performed in October 2014. Manual searches of the reference lists of included systematic reviews were conducted. The search strategies of the included systematic reviews were assessed using a standardized nine-question form based on the Cochrane Collaboration guidelines and Assessment of Multiple Systematic Reviews checklist. Multiple linear regression analyses were performed to identify the important predictors of search quality. A total of 85 systematic reviews were included. The median quality score of search strategies was 2 (interquartile range = 2). Whilst 89% of systematic reviews reported the use of search terms, only 14% of systematic reviews reported searching the grey literature. Multiple linear regression analyses identified publication year (continuous variable), meta-analysis performance and journal impact factor (continuous variable) as predictors of higher mean quality scores. This study identified the urgent need to improve the quality of search strategies within systematic reviews published in the field of SRS. This study is the first to address how authors performed searches to select clinical studies for inclusion in their systematic reviews. Comprehensive and well-implemented search strategies are pivotal to reduce the chance of publication bias and consequently generate more reliable systematic review findings.
Optimizing data collection for public health decisions: a data mining approach.
Partington, Susan N; Papakroni, Vasil; Menzies, Tim
2014-06-12
Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost.
Zhou, Lu-Yao; Jiang, Hong; Shan, Quan-Yuan; Chen, Dong; Lin, Xiao-Na; Liu, Bao-Xian; Xie, Xiao-Yan
2017-08-01
To prospectively assess the diagnostic performance of supersonic shear wave elastography (SSWE) in identifying biliary atresia (BA) among infants with conjugated hyperbilirubinaemia by comparing this approach with grey-scale ultrasonography (US). Forty infants were analysed as the control group to determine normal liver stiffness values. The use of SSWE values for identifying BA was investigated in 172 infants suspected of having BA, and results were compared with the results obtained by grey-scale US. The Mann-Whitney U test, unpaired t-test, Spearman correlation and linear regression were also performed. The success rates of SSWE measurements in the control and study group were 100% (40/40) and 96.4% (244/253), respectively. Age, direct bilirubin, and indirect bilirubin all significantly correlated with SSWE in the liver (all P < 0.001). Linear regression showed that age had a greater effect on SSWE values than direct or indirect bilirubin. The diagnostic performance of liver stiffness values in identifying BA was lower than that of grey-scale US (area under the receiver operating characteristic curve [AUC], 0.790 vs 0.893, P < 0.001). SSWE is feasible and valuable in differentiating BA from non-BA. However, its diagnostic performance does not exceed that of grey-scale US. • SSWE could be successfully performed in an infant population. • For infants, the liver stiffness will increase as age increases. • SSWE is potentially useful in assessing infants suspected of biliary atresia. • SSWE is inferior to grey-scale US in identifying biliary atresia.
Identifying risk sources of air contamination by polycyclic aromatic hydrocarbons.
Huzlik, Jiri; Bozek, Frantisek; Pawelczyk, Adam; Licbinsky, Roman; Naplavova, Magdalena; Pondelicek, Michael
2017-09-01
This article is directed to determining concentrations of polycyclic aromatic hydrocarbons (PAHs), which are sorbed to solid particles in the air. Pollution sources were identified on the basis of the ratio of benzo[ghi]perylene (BghiPe) to benzo[a]pyrene (BaP). Because various important information is lost by determining the simple ratio of concentrations, least squares linear regression (classic ordinary least squares regression), reduced major axis, orthogonal regression, and Kendall-Theil robust diagnostics were utilized for identification. Statistical evaluation using all aforementioned methods demonstrated different ratios of the monitored PAHs in the intervals examined during warmer and colder periods. Analogous outputs were provided by comparing gradients of the emission factors acquired from the measured concentrations of BghiPe and BaP in motor vehicle exhaust gases. Based on these outputs, it was possible plausibly to state that the influence of burning organic fuels in heating stoves is prevalent in colder periods whereas in warmer periods transport was the exclusive source because other sources of PAH emissions were not found in the examined locations. Copyright © 2017 Elsevier Ltd. All rights reserved.
GIS Tools to Estimate Average Annual Daily Traffic
DOT National Transportation Integrated Search
2012-06-01
This project presents five tools that were created for a geographical information system to estimate Annual Average Daily : Traffic using linear regression. Three of the tools can be used to prepare spatial data for linear regression. One tool can be...
Jose F. Negron; Willis C. Schaupp; Kenneth E. Gibson; John Anhold; Dawn Hansen; Ralph Thier; Phil Mocettini
1999-01-01
Data collected from Douglas-fir stands infected by the Douglas-fir beetle in Wyoming, Montana, Idaho, and Utah, were used to develop models to estimate amount of mortality in terms of basal area killed. Models were built using stepwise linear regression and regression tree approaches. Linear regression models using initial Douglas-fir basal area were built for all...
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Bao, Jie; Hou, Zhangshuan; Huang, Maoyi; ...
2015-12-04
Here, effective sensitivity analysis approaches are needed to identify important parameters or factors and their uncertainties in complex Earth system models composed of multi-phase multi-component phenomena and multiple biogeophysical-biogeochemical processes. In this study, the impacts of 10 hydrologic parameters in the Community Land Model on simulations of runoff and latent heat flux are evaluated using data from a watershed. Different metrics, including residual statistics, the Nash-Sutcliffe coefficient, and log mean square error, are used as alternative measures of the deviations between the simulated and field observed values. Four sensitivity analysis (SA) approaches, including analysis of variance based on the generalizedmore » linear model, generalized cross validation based on the multivariate adaptive regression splines model, standardized regression coefficients based on a linear regression model, and analysis of variance based on support vector machine, are investigated. Results suggest that these approaches show consistent measurement of the impacts of major hydrologic parameters on response variables, but with differences in the relative contributions, particularly for the secondary parameters. The convergence behaviors of the SA with respect to the number of sampling points are also examined with different combinations of input parameter sets and output response variables and their alternative metrics. This study helps identify the optimal SA approach, provides guidance for the calibration of the Community Land Model parameters to improve the model simulations of land surface fluxes, and approximates the magnitudes to be adjusted in the parameter values during parametric model optimization.« less
A decline in the prevalence of injecting drug users in Estonia, 2005–2009
Uusküla, A; Rajaleid, K; Talu, A; Abel-Ollo, K; Des Jarlais, DC
2013-01-01
Aims and setting Descriptions of behavioural epidemics have received little attention compared with infectious disease epidemics in Eastern Europe. Here we report a study aimed at estimating trends in the prevalence of injection drug use between 2005 and 2009 in Estonia. Design and methods The number of injection drug users (IDUs) aged 15–44 each year between 2005 and 2009 was estimated using capture-recapture methodology based on 4 data sources (2 treatment data bases: drug abuse and non-fatal overdose treatment; criminal justice (drug related offences) and mortality (injection drug use related deaths) data). Poisson log-linear regression models were applied to the matched data, with interactions between data sources fitted to replicate the dependencies between the data sources. Linear regression was used to estimate average change over time. Findings there were 24305, 12292, 238, 545 records and 8100, 1655, 155, 545 individual IDUs identified in the four capture sources (Police, drug treatment, overdose, and death registry, accordingly) over the period 2005 – 2009. The estimated prevalence of IDUs among the population aged 15–44 declined from 2.7% (1.8–7.9%) in 2005 to 2.0% (1.4–5.0%) in 2008, and 0.9% (0.7–1.7%) in 2009. Regression analysis indicated an average reduction of over 1700 injectors per year. Conclusion While the capture-recapture method has known limitations, the results are consistent with other data from Estonia. Identifying the drivers of change in the prevalence of injection drug use warrants further research. PMID:23290632
Watanabe, Hiroyuki; Miyazaki, Hiroyasu
2006-01-01
Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
Linear regression analysis: part 14 of a series on evaluation of scientific publications.
Schneider, Astrid; Hommel, Gerhard; Blettner, Maria
2010-11-01
Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.
Wood, Douglas R.; Burger, L. Wesley; Vilella, Francisco
2014-01-01
We investigated the relationship between red-cockaded woodpecker (Picoides borealis) reproductive success and microhabitat characteristics in a southeastern loblolly (Pinus taeda) and shortleaf (P. echinata) pine forest. From 1997 to 1999, we recorded reproductive success parameters of 41 red-cockaded woodpecker groups at the Bienville National Forest, Mississippi. Microhabitat characteristics were measured for each group during the nesting season. Logistic regression identified understory vegetation height and small nesting season home range size as predictors of red-cockaded woodpecker nest attempts. Linear regression models identified several variables as predictors of red-cockaded woodpecker reproductive success including group density, reduced hardwood component, small nesting season home range size, and shorter foraging distances. Red-cockaded woodpecker reproductive success was correlated with habitat and behavioral characteristics that emphasize high quality habitat. By providing high quality foraging habitat during the nesting season, red-cockaded woodpeckers can successfully reproduce within small home ranges.
Fisher, Charles K; Mehta, Pankaj
2015-06-01
Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach
NASA Astrophysics Data System (ADS)
Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew
2017-05-01
This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
NASA Astrophysics Data System (ADS)
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
Understanding Preprocedure Patient Flow in IR.
Zafar, Abdul Mueed; Suri, Rajeev; Nguyen, Tran Khanh; Petrash, Carson Cope; Fazal, Zanira
2016-08-01
To quantify preprocedural patient flow in interventional radiology (IR) and to identify potential contributors to preprocedural delays. An administrative dataset was used to compute time intervals required for various preprocedural patient-flow processes. These time intervals were compared across on-time/delayed cases and inpatient/outpatient cases by Mann-Whitney U test. Spearman ρ was used to assess any correlation of the rank of a procedure on a given day and the procedure duration to the preprocedure time. A linear-regression model of preprocedure time was used to further explore potential contributing factors. Any identified reason(s) for delay were collated. P < .05 was considered statistically significant. Of the total 1,091 cases, 65.8% (n = 718) were delayed. Significantly more outpatient cases started late compared with inpatient cases (81.4% vs 45.0%; P < .001, χ(2) test). The multivariate linear regression model showed outpatient status, length of delay in arrival, and longer procedure times to be significantly associated with longer preprocedure times. Late arrival of patients (65.9%), unavailability of physicians (18.4%), and unavailability of procedure room (13.0%) were the three most frequently identified reasons for delay. The delay was multifactorial in 29.6% of cases (n = 213). Objective measurement of preprocedural IR patient flow demonstrated considerable waste and highlighted high-yield areas of possible improvement. A data-driven approach may aid efficient delivery of IR care. Copyright © 2016 SIR. Published by Elsevier Inc. All rights reserved.
Cacciatore, Francesco; Della-Morte, David; Basile, Claudia; Curcio, Francesco; Liguori, Ilaria; Roselli, Mario; Gargiulo, Gaetano; Galizia, Gianluigi; Bonaduce, Domenico; Abete, Pasquale
2015-01-01
To determine the relationship between Butyryl-cholinesterase (α-glycoprotein synthesized in the liver, b-CHE) and muscle mass and strength. Muscle mass by bioimpedentiometer and muscle strength by grip strength were evaluated in 337 elderly subjects (mean age: 76.2 ± 6.7 years) admitted to comprehensive geriatric assessment. b-CHE levels were lower in sarcopenic than in nonsarcopenic elderly subjects (p < 0.01). Linear regression analysis demonstrated that b-CHE is linearly related with grip strength and muscular mass both in men and women (r = 0.45 and r = 0.33, p < 0.01; r = 0.55 and r = 0.39, p < 0.01; respectively). Multivariate analysis confirms this analysis. b-CHE is related to muscle mass and strength in elderly subjects. Thus, b-CHE may be considered to be a fair biomarker for identifying elderly subjects at risk of sarcopenia.
Scoring and staging systems using cox linear regression modeling and recursive partitioning.
Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H
2006-01-01
Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
Kanamori, Shogo; Castro, Marcia C.; Sow, Seydou; Matsuno, Rui; Cissokho, Alioune; Jimba, Masamine
2016-01-01
Background The 5S method is a lean management tool for workplace organization, with 5S being an abbreviation for five Japanese words that translate to English as Sort, Set in Order, Shine, Standardize, and Sustain. In Senegal, the 5S intervention program was implemented in 10 health centers in two regions between 2011 and 2014. Objective To identify the impact of the 5S intervention program on the satisfaction of clients (patients and caretakers) who visited the health centers. Design A standardized 5S intervention protocol was implemented in the health centers using a quasi-experimental separate pre-post samples design (four intervention and three control health facilities). A questionnaire with 10 five-point Likert items was used to measure client satisfaction. Linear regression analysis was conducted to identify the intervention's effect on the client satisfaction scores, represented by an equally weighted average of the 10 Likert items (Cronbach's alpha=0.83). Additional regression analyses were conducted to identify the intervention's effect on the scores of each Likert item. Results Backward stepwise linear regression (n=1,928) indicated a statistically significant effect of the 5S intervention, represented by an increase of 0.19 points in the client satisfaction scores in the intervention group, 6 to 8 months after the intervention (p=0.014). Additional regression analyses showed significant score increases of 0.44 (p=0.002), 0.14 (p=0.002), 0.06 (p=0.019), and 0.17 (p=0.044) points on four items, which, respectively were healthcare staff members’ communication, explanations about illnesses or cases, and consultation duration, and clients’ overall satisfaction. Conclusions The 5S has the potential to improve client satisfaction at resource-poor health facilities and could therefore be recommended as a strategic option for improving the quality of healthcare service in low- and middle-income countries. To explore more effective intervention modalities, further studies need to address the mechanisms by which 5S leads to attitude changes in healthcare staff. PMID:27900932
Analyzing industrial energy use through ordinary least squares regression models
NASA Astrophysics Data System (ADS)
Golden, Allyson Katherine
Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and production behavior, and identify opportunities for energy and cost savings. This thesis study also utilizes change-point and degree-day baseline energy models to disaggregate facility annual energy consumption into separate industrial end-user categories. The baseline energy model provides a suitable and economical alternative to sub-metering individual manufacturing equipment. One case study describes the conjoined use of baseline energy models and facility information gathered during a one-day onsite visit to perform an end-point energy analysis of an injection molding facility conducted by the Alabama Industrial Assessment Center. Applying baseline regression model results to the end-point energy analysis allowed the AIAC to better approximate the annual energy consumption of the facility's HVAC system.
Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729
Pütter, Carolin; Pechlivanis, Sonali; Nöthen, Markus M; Jöckel, Karl-Heinz; Wichmann, Heinz-Erich; Scherag, André
2011-01-01
Genome-wide association studies have identified robust associations between single nucleotide polymorphisms and complex traits. As the proportion of phenotypic variance explained is still limited for most of the traits, larger and larger meta-analyses are being conducted to detect additional associations. Here we investigate the impact of the study design and the underlying assumption about the true genetic effect in a bimodal mixture situation on the power to detect associations. We performed simulations of quantitative phenotypes analysed by standard linear regression and dichotomized case-control data sets from the extremes of the quantitative trait analysed by standard logistic regression. Using linear regression, markers with an effect in the extremes of the traits were almost undetectable, whereas analysing extremes by case-control design had superior power even for much smaller sample sizes. Two real data examples are provided to support our theoretical findings and to explore our mixture and parameter assumption. Our findings support the idea to re-analyse the available meta-analysis data sets to detect new loci in the extremes. Moreover, our investigation offers an explanation for discrepant findings when analysing quantitative traits in the general population and in the extremes. Copyright © 2011 S. Karger AG, Basel.
Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
A simplified competition data analysis for radioligand specific activity determination.
Venturino, A; Rivera, E S; Bergoc, R M; Caro, R A
1990-01-01
Non-linear regression and two-step linear fit methods were developed to determine the actual specific activity of 125I-ovine prolactin by radioreceptor self-displacement analysis. The experimental results obtained by the different methods are superposable. The non-linear regression method is considered to be the most adequate procedure to calculate the specific activity, but if its software is not available, the other described methods are also suitable.
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions
Fernandes, Bruno J. T.; Roque, Alexandre
2018-01-01
Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366
NASA Astrophysics Data System (ADS)
Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.
2009-08-01
In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.
Carvalho, Carlos; Gomes, Danielo G.; Agoulmine, Nazim; de Souza, José Neuman
2011-01-01
This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction. PMID:22346626
"Mad or bad?": burden on caregivers of patients with personality disorders.
Bauer, Rita; Döring, Antje; Schmidt, Tanja; Spießl, Hermann
2012-12-01
The burden on caregivers of patients with personality disorders is often greatly underestimated or completely disregarded. Possibilities for caregiver support have rarely been assessed. Thirty interviews were conducted with caregivers of such patients to assess illness-related burden. Responses were analyzed with a mixed method of qualitative and quantitative analysis in a sequential design. Patient and caregiver data, including sociodemographic and disease-related variables, were evaluated with regression analysis and regression trees. Caregiver statements (n = 404) were summarized into 44 global statements. The most frequent global statements were worries about the burden on other family members (70.0%), poor cooperation with clinical centers and other institutions (60.0%), financial burden (56.7%), worry about the patient's future (53.3%), and dissatisfaction with the patient's treatment and rehabilitation (53.3%). Linear regression and regression tree analysis identified predictors for more burdened caregivers. Caregivers of patients with personality disorders experience a variety of burdens, some disorder specific. Yet these caregivers often receive little attention or support.
Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients
NASA Astrophysics Data System (ADS)
Gorgees, HazimMansoor; Mahdi, FatimahAssim
2018-05-01
This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.
Prediction of dynamical systems by symbolic regression
NASA Astrophysics Data System (ADS)
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Detection of epistatic effects with logic regression and a classical linear regression model.
Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata
2014-02-01
To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.
Landscape controls on total and methyl Hg in the Upper Hudson River basin, New York, USA
Burns, Douglas A.; Riva-Murray, K.; Bradley, P.M.; Aiken, G.R.; Brigham, M.E.
2012-01-01
Approaches are needed to better predict spatial variation in riverine Hg concentrations across heterogeneous landscapes that include mountains, wetlands, and open waters. We applied multivariate linear regression to determine the landscape factors and chemical variables that best account for the spatial variation of total Hg (THg) and methyl Hg (MeHg) concentrations in 27 sub-basins across the 493 km2 upper Hudson River basin in the Adirondack Mountains of New York. THg concentrations varied by sixfold, and those of MeHg by 40-fold in synoptic samples collected at low-to-moderate flow, during spring and summer of 2006 and 2008. Bivariate linear regression relations of THg and MeHg concentrations with either percent wetland area or DOC concentrations were significant but could account for only about 1/3 of the variation in these Hg forms in summer. In contrast, multivariate linear regression relations that included metrics of (1) hydrogeomorphology, (2) riparian/wetland area, and (3) open water, explained about 66% to >90% of spatial variation in each Hg form in spring and summer samples. These metrics reflect the influence of basin morphometry and riparian soils on Hg source and transport, and the role of open water as a Hg sink. Multivariate models based solely on these landscape metrics generally accounted for as much or more of the variation in Hg concentrations than models based on chemical and physical metrics, and show great promise for identifying waters with expected high Hg concentrations in the Adirondack region and similar glaciated riverine ecosystems.
Vyskocil, Erich; Gruther, Wolfgang; Steiner, Irene; Schuhfried, Othmar
2014-07-01
Disease-specific categories of the International Classification of Functioning, Disability and Health have not yet been described for patients with chronic peripheral arterial obstructive disease (PAD). The authors examined the relationship between the categories of the Brief Core Sets for ischemic heart diseases with the Peripheral Artery Questionnaire and the ankle-brachial index to determine which International Classification of Functioning, Disability and Health categories are most relevant for patients with PAD. This is a retrospective cohort study including 77 patients with verified PAD. Statistical analyses of the relationship between International Classification of Functioning, Disability and Health categories as independent variables and the endpoints Peripheral Artery Questionnaire or ankle-brachial index were carried out by simple and stepwise linear regression models adjusting for age, sex, and leg (left vs. right). The stepwise linear regression model with the ankle-brachial index as dependent variable revealed a significant effect of the variables blood vessel functions and muscle endurance functions. Calculating a stepwise linear regression model with the Peripheral Artery Questionnaire as dependent variable, a significant effect of age, emotional functions, energy and drive functions, carrying out daily routine, as well as walking could be observed. This study identifies International Classification of Functioning, Disability and Health categories in the Brief Core Sets for ischemic heart diseases that show a significant effect on the ankle-brachial index and the Peripheral Artery Questionnaire score in patients with PAD. These categories provide fundamental information on functioning of patients with PAD and patient-centered outcomes for rehabilitation interventions.
Ding, Changfeng; Li, Xiaogang; Zhang, Taolin; Ma, Yibing; Wang, Xingxiang
2014-10-01
Soil environmental quality standards in respect of heavy metals for farmlands should be established considering both their effects on crop yield and their accumulation in the edible part. A greenhouse experiment was conducted to investigate the effects of chromium (Cr) on biomass production and Cr accumulation in carrot plants grown in a wide range of soils. The results revealed that carrot yield significantly decreased in 18 of the total 20 soils with Cr addition being the soil environmental quality standard of China. The Cr content of carrot grown in the five soils with pH>8.0 exceeded the maximum allowable level (0.5mgkg(-1)) according to the Chinese General Standard for Contaminants in Foods. The relationship between carrot Cr concentration and soil pH could be well fitted (R(2)=0.70, P<0.0001) by a linear-linear segmented regression model. The addition of Cr to soil influenced carrot yield firstly rather than the food quality. The major soil factors controlling Cr phytotoxicity and the prediction models were further identified and developed using path analysis and stepwise multiple linear regression analysis. Soil Cr thresholds for phytotoxicity meanwhile ensuring food safety were then derived on the condition of 10 percent yield reduction. Copyright © 2014 Elsevier Inc. All rights reserved.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
Alzheimer's Disease Detection by Pseudo Zernike Moment and Linear Regression Classification.
Wang, Shui-Hua; Du, Sidan; Zhang, Yin; Phillips, Preetha; Wu, Le-Nan; Chen, Xian-Qing; Zhang, Yu-Dong
2017-01-01
This study presents an improved method based on "Gorji et al. Neuroscience. 2015" by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Lopez, David S; Advani, Shailesh; Qiu, Xueting; Tsilidis, Konstantinos K; Khera, Mohit; Kim, Jeri; Canfield, Steven
2018-04-25
The association of caffeine intake with testosterone remains unclear. We evaluated the association of caffeine intake with serum testosterone among American men and determined whether this association varied by race/ethnicity and measurements of adiposity. Data were analyzed for 2581 men (≥20 years old) who participated in the cycles of the NHANES 1999-2004 and 2011-2012, a cross-sectional study. Testosterone (ng/mL) was measured by immunoassay among men who participated in the morning examination session. We analyzed 24-h dietary recall data to estimate caffeine intake (mg/day). Multivariable weighted linear regression models were conducted. We identified no linear relationship between caffeine intake and testosterone levels in the total population, but there was a non-linear association (p nonlinearity < .01). Similarly, stratified analysis showed nonlinear associations among Mexican-American and Non-Hispanic White men (p nonlinearity ≤ .03 both) and only among men with waist circumference <102 cm and body mass index <25 kg/m 2 (p nonlinearity < .01, both). No linear association was identified between levels of caffeine intake and testosterone in US men, but we observed a non-linear association, including among racial/ethnic groups and measurements of adiposity in this cross-sectional study. These associations are warranted to be investigated in larger prospective studies.
Kwan, Johnny S H; Kung, Annie W C; Sham, Pak C
2011-09-01
Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.
2013-01-01
application of the Hammett equation with the constants rph in the chemistry of organophosphorus compounds, Russ. Chem. Rev. 38 (1969) 795–811. [13...of oximes and OP compounds and the ability of oximes to reactivate OP- inhibited AChE. Multiple linear regression equations were analyzed using...phosphonate pairs, 21 oxime/ phosphoramidate pairs and 12 oxime/phosphate pairs. The best linear regression equation resulting from multiple regression anal
Knowledge, Attitude, and Practices Regarding Vector-borne Diseases in Western Jamaica.
Alobuia, Wilson M; Missikpode, Celestin; Aung, Maung; Jolly, Pauline E
2015-01-01
Outbreaks of vector-borne diseases (VBDs) such as dengue and malaria can overwhelm health systems in resource-poor countries. Environmental management strategies that reduce or eliminate vector breeding sites combined with improved personal prevention strategies can help to significantly reduce transmission of these infections. The aim of this study was to assess the knowledge, attitudes, and practices (KAPs) of residents in western Jamaica regarding control of mosquito vectors and protection from mosquito bites. A cross-sectional study was conducted between May and August 2010 among patients or family members of patients waiting to be seen at hospitals in western Jamaica. Participants completed an interviewer-administered questionnaire on sociodemographic factors and KAPs regarding VBDs. KAP scores were calculated and categorized as high or low based on the number of correct or positive responses. Logistic regression analyses were conducted to identify predictors of KAP and linear regression analysis conducted to determine if knowledge and attitude scores predicted practice scores. In all, 361 (85 men and 276 women) people participated in the study. Most participants (87%) scored low on knowledge and practice items (78%). Conversely, 78% scored high on attitude items. By multivariate logistic regression, housewives were 82% less likely than laborers to have high attitude scores; homeowners were 65% less likely than renters to have high attitude scores. Participants from households with 1 to 2 children were 3.4 times more likely to have high attitude scores compared with those from households with no children. Participants from households with at least 5 people were 65% less likely than those from households with fewer than 5 people to have high practice scores. By multivariable linear regression knowledge and attitude scores were significant predictors of practice score. The study revealed poor knowledge of VBDs and poor prevention practices among participants. It identified specific groups that can be targeted with vector control and personal protection interventions to decrease transmission of the infections. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Rogers, R. H. (Principal Investigator)
1976-01-01
The author has identified the following significant results. Computer techniques were developed for mapping water quality parameters from LANDSAT data, using surface samples collected in an ongoing survey of water quality in Saginaw Bay. Chemical and biological parameters were measured on 31 July 1975 at 16 bay stations in concert with the LANDSAT overflight. Application of stepwise linear regression bands to nine of these parameters and corresponding LANDSAT measurements for bands 4 and 5 only resulted in regression correlation coefficients that varied from 0.94 for temperature to 0.73 for Secchi depth. Regression equations expressed with the pair of bands 4 and 5, rather than the ratio band 4/band 5, provided higher correlation coefficients for all the water quality parameters studied (temperature, Secchi depth, chloride, conductivity, total kjeldahl nitrogen, total phosphorus, chlorophyll a, total solids, and suspended solids).
Healthy life expectancy in Hong Kong Special Administrative Region of China.
Law, C. K.; Yip, P. S. F.
2003-01-01
Sullivan's method and a regression model were used to calculate healthy life expectancy (HALE) for men and women in Hong Kong Special Administrative Region (Hong Kong SAR) of China. These methods need estimates of the prevalence and information on disability distributions of 109 diseases and HALE for 191 countries by age, sex and region of the world from the WHO's health assessment of 2000. The population of Hong Kong SAR has one of the highest healthy life expectancies in the world. Sullivan's method gives higher estimates than the classic linear regression method. Although Sullivan's method accurately calculates the influence of disease prevalence within small areas and regions, the regression method can approximate HALE for all economies for which information on life expectancy is available. This paper identifies some problems of the two methods and discusses the accuracy of estimates of HALE that rely on data from the WHO assessment. PMID:12640475
A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary
NASA Astrophysics Data System (ADS)
Gillis, Nicolas; Luce, Robert
2018-01-01
A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.
Specialization Agreements in the Council for Mutual Economic Assistance
1988-02-01
proportions to stabilize variance (S. Weisberg, Applied Linear Regression , 2nd ed., John Wiley & Sons, New York, 1985, p. 134). If the dependent...27, 1986, p. 3. Weisberg, S., Applied Linear Regression , 2nd ed., John Wiley & Sons, New York, 1985, p. 134. Wiles, P. J., Communist International
Radio Propagation Prediction Software for Complex Mixed Path Physical Channels
2006-08-14
63 4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz 69 4.4.7. Projected Scaling to...4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz In order to construct a comprehensive numerical algorithm capable of
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Data Transformations for Inference with Linear Regression: Clarifications and Recommendations
ERIC Educational Resources Information Center
Pek, Jolynn; Wong, Octavia; Wong, C. M.
2017-01-01
Data transformations have been promoted as a popular and easy-to-implement remedy to address the assumption of normally distributed errors (in the population) in linear regression. However, the application of data transformations introduces non-ignorable complexities which should be fully appreciated before their implementation. This paper adds to…
USING LINEAR AND POLYNOMIAL MODELS TO EXAMINE THE ENVIRONMENTAL STABILITY OF VIRUSES
The article presents the development of model equations for describing the fate of viral infectivity in environmental samples. Most of the models were based upon the use of a two-step linear regression approach. The first step employs regression of log base 10 transformed viral t...
Simple and multiple linear regression: sample size considerations.
Hanley, James A
2016-11-01
The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.
Jiang, Feng; Han, Ji-zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods. PMID:29623088
Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
Fear of falling in older adults living at home: associated factors.
Vitorino, Luciano Magalhães; Teixeira, Carla Araujo Bastos; Boas, Eliandra Laís Vilas; Pereira, Rúbia Lopes; Santos, Naiana Oliveira Dos; Rozendo, Célia Alves
2017-04-10
To identify the factors associated with the fear of falling in the older adultliving at home. Cross-sectional study with probabilistic sampling of older adultenrolled in two Family Health Strategies (FHS). The fear of falling was measured by the Brazilian version of the Falls Efficacy Scale-International and by a household questionnairethat contained the explanatory variables. Multiple Linear Regression using the stepwise selection technique and the Generalized Linear Models were used in the statistical analyses. A total of170 older adultsparticipated in the research, 85 from each FHS. The majority (57.1%) aged between 60 and 69; 67.6% were female; 46.1% fell once in the last year. The majority of the older adults(66.5%) had highfear of falling. In the final multiple linear regression model, it was identified that a higher number of previous falls, female gender, older age, and worse health self-assessment explained 37% of the fear of falling among the older adult. The findings reinforce the need to assess the fear of falling among the older adultliving at home, in conjunction with the development and use ofstrategies based on modifiable factors by professionalsto reduce falls and improve health status, which may contribute to the reduction of the fear of falling among the older adult. Identificar os fatores associados ao medo de cair em idosos residentes no domicílio. Estudo transversal com amostragem probabilística de idosos cadastrados em duas Estratégias Saúde da Família (ESF). O medo de cair foi avaliado pela versão brasileira da escala Falls Efficacy Scale International e por um inquérito domiciliar que continha as variáveis explicativas.A Regressão Linear Múltipla por meio da técnica stepwise selectione osModelos Lineares Generalizados foram utilizados nas análises estatísticas. Participaram da pesquisa170 idosos, 85 de cada ESF. A maioria (57,1%) tinha entre 60 e 69 anos de idade; 67,6% eram do sexo feminino; 46,1% tiveram queda no último ano. A maioria dos idosos (66,5%) tinha elevado medo de cair. No modelo final de regressão multivariada, identificou-se que maior número de quedas anteriores, sexo feminino, idade mais avançada, e pior autoavaliação de saúde explicaram 37% do medo de cair entre os idosos. Os achados reforçam a necessidade da avaliação do medo de cair entre os idosos que residem no próprio domicílio, assim como o desenvolvimento e a utilização de estratégias pelos profissionais voltadas para os fatores modificáveis,de modo a reduzir as quedas e melhorar o estado de saúde, o que pode contribuir para a diminuição do medo de cair entre os idosos.
Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.
2009-01-01
Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358
Genetic Programming Transforms in Linear Regression Situations
NASA Astrophysics Data System (ADS)
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Physical activity, sedentary behavior, and vitamin D metabolites.
Hibler, Elizabeth A; Sardo Molmenti, Christine L; Dai, Qi; Kohler, Lindsay N; Warren Anderson, Shaneda; Jurutka, Peter W; Jacobs, Elizabeth T
2016-02-01
Physical activity is associated with circulating 25-hydroxyvitamin D (25(OH)D). However, the influence of activity and/or sedentary behavior on the biologically active, seco-steroid hormone 1α,25-dihydroxyvitamin D (1,25(OH)2D) is unknown. We conducted a cross-sectional analysis among ursodeoxycholic acid (UDCA) randomized trial participants (n=876) to evaluate associations between physical activity, sedentary behavior, and circulating vitamin D metabolite concentrations. Continuous vitamin D metabolite measurements and clinical thresholds were evaluated using multiple linear and logistic regression models, mutually adjusted for either 1,25(OH)2D or 25(OH)D and additional confounding factors. A statistically significant linear association between 1,25(OH)2D and moderate-vigorous physical activity per week was strongest among women (β (95% CI): 3.10 (1.51-6.35)) versus men (β (95% CI): 1.35 (0.79-2.29)) in the highest tertile of activity compared to the lowest (p-interaction=0.003). Furthermore, 25(OH)D was 1.54ng/ml (95% CI 1.09-1.98) higher per hour increase in moderate-vigorous activity (p=0.001) and odds of sufficient 25(OH)D status was higher among physically active participants (p=0.001). Sedentary behavior was not significantly associated with either metabolite in linear regression models, nor was a statistically significant interaction by sex identified. The current study identified novel associations between physical activity and serum 1,25(OH)2D levels, adjusted for 25(OH)D concentrations. These results identify the biologically active form of vitamin D as a potential physiologic mechanism related to observed population-level associations between moderate-vigorous physical activity with bone health and chronic disease risk. However, future longitudinal studies are needed to further evaluate the role of physical activity and vitamin D metabolites in chronic disease prevention. Copyright © 2015 Elsevier Inc. All rights reserved.
A Multiomics Approach to Identify Genes Associated with Childhood Asthma Risk and Morbidity.
Forno, Erick; Wang, Ting; Yan, Qi; Brehm, John; Acosta-Perez, Edna; Colon-Semidey, Angel; Alvarez, Maria; Boutaoui, Nadia; Cloutier, Michelle M; Alcorn, John F; Canino, Glorisa; Chen, Wei; Celedón, Juan C
2017-10-01
Childhood asthma is a complex disease. In this study, we aim to identify genes associated with childhood asthma through a multiomics "vertical" approach that integrates multiple analytical steps using linear and logistic regression models. In a case-control study of childhood asthma in Puerto Ricans (n = 1,127), we used adjusted linear or logistic regression models to evaluate associations between several analytical steps of omics data, including genome-wide (GW) genotype data, GW methylation, GW expression profiling, cytokine levels, asthma-intermediate phenotypes, and asthma status. At each point, only the top genes/single-nucleotide polymorphisms/probes/cytokines were carried forward for subsequent analysis. In step 1, asthma modified the gene expression-protein level association for 1,645 genes; pathway analysis showed an enrichment of these genes in the cytokine signaling system (n = 269 genes). In steps 2-3, expression levels of 40 genes were associated with intermediate phenotypes (asthma onset age, forced expiratory volume in 1 second, exacerbations, eosinophil counts, and skin test reactivity); of those, methylation of seven genes was also associated with asthma. Of these seven candidate genes, IL5RA was also significant in analytical steps 4-8. We then measured plasma IL-5 receptor α levels, which were associated with asthma age of onset and moderate-severe exacerbations. In addition, in silico database analysis showed that several of our identified IL5RA single-nucleotide polymorphisms are associated with transcription factors related to asthma and atopy. This approach integrates several analytical steps and is able to identify biologically relevant asthma-related genes, such as IL5RA. It differs from other methods that rely on complex statistical models with various assumptions.
Serum Albumin and Disease Severity of Non-Cystic Fibrosis Bronchiectasis.
Lee, Seung Jun; Kim, Hyo-Jung; Kim, Ju-Young; Ju, Sunmi; Lim, Sujin; Yoo, Jung Wan; Nam, Sung-Jin; Lee, Gi Dong; Cho, Hyun Seop; Kim, Rock Bum; Cho, Yu Ji; Jeong, Yi Yeong; Kim, Ho Cheol; Lee, Jong Deog
2017-08-01
A clinical classification system has been developed to define the severity and predict the prognosis of subjects with non-cystic fibrosis (CF) bronchiectasis. We aimed to identify laboratory parameters that are correlated with the bronchiectasis severity index (BSI) and FACED score. The medical records of 107 subjects with non-CF bronchiectasis for whom BSI and FACED scores could be calculated were retrospectively reviewed. The correlations between the laboratory parameters and BSI or FACED score were assessed, and multiple-linear regression analysis was performed to identify variables independently associated with BSI and FACED score. An additional subgroup analysis was performed according to sex. Among all of the enrolled subjects, 49 (45.8%) were male and 58 (54.2%) were female. The mean BSI and FACED scores were 9.43 ± 3.81 and 1.92 ± 1.59, respectively. The serum albumin level (r = -0.49), bilirubin level (r = -0.31), C-reactive protein level (r = 0.22), hemoglobin level (r = -0.2), and platelet/lymphocyte ratio (r = 0.31) were significantly correlated with BSI. Meanwhile, serum albumin (r = -0.37) and bilirubin level (r = -0.25) showed a significant correlation with the FACED score. Multiple-linear regression analysis showed that the serum bilirubin level was independently associated with BSI, and the serum albumin level was independently associated with both scoring systems. Subgroup analysis revealed that the level of uric acid was also a significant variable independently associated with the BSI in male bronchiectasis subjects. Several laboratory variables were identified as possible prognostic factors for non-CF bronchiectasis. Among them, the serum albumin level exhibited the strongest correlation and was identified as an independent variable associated with the BSI and FACED scores. Copyright © 2017 by Daedalus Enterprises.
Naval Research Logistics Quarterly. Volume 28. Number 3,
1981-09-01
denotes component-wise maximum. f has antone (isotone) differences on C x D if for cl < c2 and d, < d2, NAVAL RESEARCH LOGISTICS QUARTERLY VOL. 28...or negative correlations and linear or nonlinear regressions. Given are the mo- ments to order two and, for special cases, (he regression function and...data sets. We designate this bnb distribution as G - B - N(a, 0, v). The distribution admits only of positive correlation and linear regressions
Multi-sensory landscape assessment: the contribution of acoustic perception to landscape evaluation.
Gan, Yonghong; Luo, Tao; Breitung, Werner; Kang, Jian; Zhang, Tianhai
2014-12-01
In this paper, the contribution of visual and acoustic preference to multi-sensory landscape evaluation was quantitatively compared. The real landscapes were treated as dual-sensory ambiance and separated into visual landscape and soundscape. Both were evaluated by 63 respondents in laboratory conditions. The analysis of the relationship between respondent's visual and acoustic preference as well as their respective contribution to landscape preference showed that (1) some common attributes are universally identified in assessing visual, aural and audio-visual preference, such as naturalness or degree of human disturbance; (2) with acoustic and visual preferences as variables, a multi-variate linear regression model can satisfactorily predict landscape preference (R(2 )= 0.740), while the coefficients of determination for a unitary linear regression model were 0.345 and 0.720 for visual and acoustic preference as predicting factors, respectively; (3) acoustic preference played a much more important role in landscape evaluation than visual preference in this study (the former is about 4.5 times of the latter), which strongly suggests a rethinking of the role of soundscape in environment perception research and landscape planning practice.
Mental ability and psychological work performance in Chinese workers.
Zhong, Fei; Yano, Eiji; Lan, Yajia; Wang, Mianzhen; Wang, Zhiming; Wang, Xiaorong
2006-10-01
This study was to explore the relationship among mental ability, occupational stress, and psychological work performance in Chinese workers, and to identify relevant modifiers of mental ability and psychological work performance. Psychological Stress Intensity (PSI), psychological work performance, and mental ability (Mental Function Index, MFI) were determined among 485 Chinese workers (aged 33 to 62 yr, 65% of men) with varied work occupations. Occupational Stress Questionnaire (OSQ) and mental ability with 3 tests (including immediate memory, digit span, and cipher decoding) were used. The relationship between mental ability and psychological work performance was analyzed with multiple linear regression approach. PSI, MFI, or psychological work performance were significantly different among different work types and educational level groups (p<0.01). Multiple linear regression analysis showed that MFI was significantly related to gender, age, educational level, and work type. Higher MFI and lower PSI predicted a better psychological work performance, even after adjusted for gender, age, educational level, and work type. The study suggests that occupational stress and low mental ability are important predictors for poor psychological work performance, which is modified by both gender and educational level.
Development of statistical linear regression model for metals from transportation land uses.
Maniquiz, Marla C; Lee, Soyoung; Lee, Eunju; Kim, Lee-Hyung
2009-01-01
The transportation landuses possessing impervious surfaces such as highways, parking lots, roads, and bridges were recognized as the highly polluted non-point sources (NPSs) in the urban areas. Lots of pollutants from urban transportation are accumulating on the paved surfaces during dry periods and are washed-off during a storm. In Korea, the identification and monitoring of NPSs still represent a great challenge. Since 2004, the Ministry of Environment (MOE) has been engaged in several researches and monitoring to develop stormwater management policies and treatment systems for future implementation. The data over 131 storm events during May 2004 to September 2008 at eleven sites were analyzed to identify correlation relationships between particulates and metals, and to develop simple linear regression (SLR) model to estimate event mean concentration (EMC). Results indicate that there was no significant relationship between metals and TSS EMC. However, the SLR estimation models although not providing useful results are valuable indicators of high uncertainties that NPS pollution possess. Therefore, long term monitoring employing proper methods and precise statistical analysis of the data should be undertaken to eliminate these uncertainties.
Automating approximate Bayesian computation by local linear regression.
Thornton, Kevin R
2009-07-07
In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.
NASA Astrophysics Data System (ADS)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
NASA Astrophysics Data System (ADS)
Das, Bappa; Sahoo, Rabi N.; Pargal, Sourabh; Krishna, Gopal; Verma, Rakesh; Chinnusamy, Viswanathan; Sehgal, Vinay K.; Gupta, Vinod K.; Dash, Sushanta K.; Swain, Padmini
2018-03-01
In the present investigation, the changes in sucrose, reducing and total sugar content due to water-deficit stress in rice leaves were modeled using visible, near infrared (VNIR) and shortwave infrared (SWIR) spectroscopy. The objectives of the study were to identify the best vegetation indices and suitable multivariate technique based on precise analysis of hyperspectral data (350 to 2500 nm) and sucrose, reducing sugar and total sugar content measured at different stress levels from 16 different rice genotypes. Spectral data analysis was done to identify suitable spectral indices and models for sucrose estimation. Novel spectral indices in near infrared (NIR) range viz. ratio spectral index (RSI) and normalised difference spectral indices (NDSI) sensitive to sucrose, reducing sugar and total sugar content were identified which were subsequently calibrated and validated. The RSI and NDSI models had R2 values of 0.65, 0.71 and 0.67; RPD values of 1.68, 1.95 and 1.66 for sucrose, reducing sugar and total sugar, respectively for validation dataset. Different multivariate spectral models such as artificial neural network (ANN), multivariate adaptive regression splines (MARS), multiple linear regression (MLR), partial least square regression (PLSR), random forest regression (RFR) and support vector machine regression (SVMR) were also evaluated. The best performing multivariate models for sucrose, reducing sugars and total sugars were found to be, MARS, ANN and MARS, respectively with respect to RPD values of 2.08, 2.44, and 1.93. Results indicated that VNIR and SWIR spectroscopy combined with multivariate calibration can be used as a reliable alternative to conventional methods for measurement of sucrose, reducing sugars and total sugars of rice under water-deficit stress as this technique is fast, economic, and noninvasive.
Spectral-Spatial Shared Linear Regression for Hyperspectral Image Classification.
Haoliang Yuan; Yuan Yan Tang
2017-04-01
Classification of the pixels in hyperspectral image (HSI) is an important task and has been popularly applied in many practical applications. Its major challenge is the high-dimensional small-sized problem. To deal with this problem, lots of subspace learning (SL) methods are developed to reduce the dimension of the pixels while preserving the important discriminant information. Motivated by ridge linear regression (RLR) framework for SL, we propose a spectral-spatial shared linear regression method (SSSLR) for extracting the feature representation. Comparing with RLR, our proposed SSSLR has the following two advantages. First, we utilize a convex set to explore the spatial structure for computing the linear projection matrix. Second, we utilize a shared structure learning model, which is formed by original data space and a hidden feature space, to learn a more discriminant linear projection matrix for classification. To optimize our proposed method, an efficient iterative algorithm is proposed. Experimental results on two popular HSI data sets, i.e., Indian Pines and Salinas demonstrate that our proposed methods outperform many SL methods.
Vitamin D insufficiency and subclinical atherosclerosis in non-diabetic males living with HIV.
Portilla, Joaquín; Moreno-Pérez, Oscar; Serna-Candel, Carmen; Escoín, Corina; Alfayate, Rocio; Reus, Sergio; Merino, Esperanza; Boix, Vicente; Giner, Livia; Sánchez-Payá, José; Picó, Antonio
2014-01-01
Vitamin D insufficiency (VDI) has been associated with increased cardiovascular risk in the non-HIV population. This study evaluates the relationship among serum 25-hydroxyvitamin D [25(OH)D] levels, cardiovascular risk factors, adipokines, antiviral therapy (ART) and subclinical atherosclerosis in HIV-infected males. A cross-sectional study in ambulatory care was made in non-diabetic patients living with HIV. VDI was defined as 25(OH)D serum levels <75 nmol/L. Fasting lipids, glucose, inflammatory markers (tumour necrosis factor-α, interleukin-6, high-sensitivity C-reactive protein) and endothelial markers (plasminogen activator inhibitor-1, or PAI-I) were measured. The common carotid artery intima-media thickness (C-IMT) was determined. A multivariate logistic regression analysis was made to identify factors associated with the presence of VDI, while multivariate linear regression analysis was used to identify factors associated with common C-IMT. Eighty-nine patients were included (age 42 ± 8 years), 18.9% were in CDC (US Centers for Disease Control and Prevention) stage C and 75 were on ART. VDI was associated with ART exposure, sedentary lifestyle, higher triglycerides levels and PAI-I. In univariate analysis, VDI was associated with greater common C-IMT. The multivariate linear regression model, adjusted by confounding factors, revealed an independent association between common C-IMT and patient age, time of exposure to protease inhibitors (PIs) and impaired fasting glucose (IFG). In contrast, there were no independent associations between common C-IMT and VDI or inflammatory and endothelial markers. VDI was not independently associated with subclinical atherosclerosis in non-diabetic males living with HIV. Older age, a longer exposure to PIs, and IFG were independent factors associated with common C-IMT in this population.
Simple linear and multivariate regression models.
Rodríguez del Águila, M M; Benítez-Parejo, N
2011-01-01
In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.
Modeling Laterality of the Globus Pallidus Internus in Patients With Parkinson's Disease.
Sharim, Justin; Yazdi, Daniel; Baohan, Amy; Behnke, Eric; Pouratian, Nader
2017-04-01
Neurosurgical interventions such as deep brain stimulation surgery of the globus pallidus internus (GPi) play an important role in the treatment of medically refractory Parkinson's disease (PD), and require high targeting accuracy. Variability in the laterality of the GPi across patients with PD has not been well characterized. The aim of this report is to identify factors that may contribute to differences in position of the motor region of GPi. The charts and operative reports of 101 PD patients following deep brain stimulation surgery (70 males, aged 11-78 years) representing 201 GPi were retrospectively reviewed. Data extracted for each subject include age, gender, anterior and posterior commissures (AC-PC) distance, and third ventricular width. Multiple linear regression, stepwise regression, and relative importance of regressors analysis were performed to assess the predictive ability of these variables on GPi laterality. Multiple linear regression for target vs. third ventricular width, gender, AC-PC distance, and age were significant for normalized linear regression coefficients of 0.333 (p < 0.0001), 0.206 (p = 0.00219), 0.168 (p = 0.0119), and 0.159 (p = 0.0136), respectively. Third ventricular width, gender, AC-PC distance, and age each account for 44.06% (21.38-65.69%, 95% CI), 20.82% (10.51-35.88%), 21.46% (8.28-37.05%), and 13.66% (2.62-28.64%) of the R 2 value, respectively. Effect size calculation was significant for a change in the GPi laterality of 0.19 mm per mm of ventricular width, 0.11 mm per mm of AC-PC distance, 0.017 mm per year in age, and 0.54 mm increase for male gender. This variability highlights the limitations of indirect targeting alone, and argues for the continued use of MRI as well as intraoperative physiological testing to account for such factors that contribute to patient-specific variability in GPi localization. © 2016 International Neuromodulation Society.
Refractive Status at Birth: Its Relation to Newborn Physical Parameters at Birth and Gestational Age
Varghese, Raji Mathew; Sreenivas, Vishnubhatla; Puliyel, Jacob Mammen; Varughese, Sara
2009-01-01
Background Refractive status at birth is related to gestational age. Preterm babies have myopia which decreases as gestational age increases and term babies are known to be hypermetropic. This study looked at the correlation of refractive status with birth weight in term and preterm babies, and with physical indicators of intra-uterine growth such as the head circumference and length of the baby at birth. Methods All babies delivered at St. Stephens Hospital and admitted in the nursery were eligible for the study. Refraction was performed within the first week of life. 0.8% tropicamide with 0.5% phenylephrine was used to achieve cycloplegia and paralysis of accommodation. 599 newborn babies participated in the study. Data pertaining to the right eye is utilized for all the analyses except that for anisometropia where the two eyes were compared. Growth parameters were measured soon after birth. Simple linear regression analysis was performed to see the association of refractive status, (mean spherical equivalent (MSE), astigmatism and anisometropia) with each of the study variables, namely gestation, length, weight and head circumference. Subsequently, multiple linear regression was carried out to identify the independent predictors for each of the outcome parameters. Results Simple linear regression showed a significant relation between all 4 study variables and refractive error but in multiple regression only gestational age and weight were related to refractive error. The partial correlation of weight with MSE adjusted for gestation was 0.28 and that of gestation with MSE adjusted for weight was 0.10. Birth weight had a higher correlation to MSE than gestational age. Conclusion This is the first study to look at refractive error against all these growth parameters, in preterm and term babies at birth. It would appear from this study that birth weight rather than gestation should be used as criteria for screening for refractive error, especially in developing countries where the incidence of intrauterine malnutrition is higher. PMID:19214228
Emission and distribution of phosphine in paddy fields and its relationship with greenhouse gases.
Chen, Weiyi; Niu, Xiaojun; An, Shaorong; Sheng, Hong; Tang, Zhenghua; Yang, Zhiquan; Gu, Xiaohong
2017-12-01
Phosphine (PH 3 ), as a gaseous phosphide, plays an important role in the phosphorus cycle in ecosystems. In this study, the emission and distribution of phosphine, carbon dioxide (CO 2 ) and methane (CH 4 ) in paddy fields were investigated to speculate the future potential impacts of enhanced greenhouse effect on phosphorus cycle involved in phosphine by the method of Pearson correlation analysis and multiple linear regression analysis. During the whole period of rice growth, there was a significant positive correlation between CO 2 emission flux and PH 3 emission flux (r=0.592, p=0.026, n=14). Similarly, a significant positive correlation of emission flux was also observed between CH 4 and PH 3 (r=0.563, p=0.036, n=14). The linear regression relationship was determined as [PH 3 ] flux =0.007[CO 2 ] flux +0.063[CH 4 ] flux -4.638. No significant differences were observed for all values of matrix-bound phosphine (MBP), soil carbon dioxide (SCO 2 ), and soil methane (SCH 4 ) in paddy soils. However, there was a significant positive correlation between MBP and SCO 2 at heading, flowering and ripening stage. The correlation coefficients were 0.909, 0.890 and 0.827, respectively. In vertical distribution, MBP had the analogical variation trend with SCO 2 and SCH 4 . Through Pearson correlation analysis and multiple stepwise linear regression analysis, pH, redox potential (Eh), total phosphorus (TP) and acid phosphatase (ACP) were identified as the principal factors affecting MBP levels, with correlative rankings of Eh>pH>TP>ACP. The multiple stepwise regression model ([MBP]=0.456∗[ACP]+0.235∗[TP]-1.458∗[Eh]-36.547∗[pH]+352.298) was obtained. The findings in this study hold great reference values to the global biogeochemical cycling of phosphorus in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-09-14
This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.
2014-12-01
Primary Military Occupational Specialty PRO Proficiency Q-Q Quantile - Quantile RSS Residual Sum of Squares SI Shop Information T&R Training and...construct multivariate linear regression models to estimate Marines’ Computed Tier Score and time to achieve E-4 based on their individual personal...Science (GS) score, ASVAB Mathematics Knowledge (MK) score, ASVAB Paragraph Comprehension (PC) score, weight , and whether a Marine receives a weight
Serrano-Gallardo, Pilar; Martínez-Marcos, Mercedes; Espejo-Matorrales, Flora; Arakawa, Tiemi; Magnabosco, Gabriela Tavares; Pinto, Ione Carvalho
2016-01-01
ABSTRACT Objective: to identify the students' perception about the quality of clinical placements and asses the influence of the different tutoring processes in clinical learning. Methods: analytical cross-sectional study on second and third year nursing students (n=122) about clinical learning in primary health care. The Clinical Placement Evaluation Tool and a synthetic index of attitudes and skills were computed to give scores to the clinical learning (scale 0-10). Univariate, bivariate and multivariate (multiple linear regression) analyses were performed. Results: the response rate was 91.8%. The most commonly identified tutoring process was "preceptor-professor" (45.2%). The clinical placement was assessed as "optimal" by 55.1%, relationship with team-preceptor was considered good by 80.4% of the cases and the average grade for clinical learning was 7.89. The multiple linear regression model with more explanatory capacity included the variables "Academic year" (beta coefficient = 1.042 for third-year students), "Primary Health Care Area (PHC)" (beta coefficient = 0.308 for Area B) and "Clinical placement perception" (beta coefficient = - 0.204 for a suboptimal perception). Conclusions: timeframe within the academic program, location and clinical placement perception were associated with students' clinical learning. Students' perceptions of setting quality were positive and a good team-preceptor relationship is a matter of relevance. PMID:27627124
Reese, Jared C; Karsy, Michael; Twitchell, Spencer; Bisson, Erica F
2018-04-11
Examining the costs of single- and multilevel anterior cervical discectomy and fusion (ACDF) is important for the identification of cost drivers and potentially reducing patient costs. A novel tool at our institution provides direct costs for the identification of potential drivers. To assess perioperative healthcare costs for patients undergoing an ACDF. Patients who underwent an elective ACDF between July 2011 and January 2017 were identified retrospectively. Factors adding to total cost were placed into subcategories to identify the most significant contributors, and potential drivers of total cost were evaluated using a multivariable linear regression model. A total of 465 patients (mean, age 53 ± 12 yr, 54% male) met the inclusion criteria for this study. The distribution of total cost was broken down into supplies/implants (39%), facility utilization (37%), physician fees (14%), pharmacy (7%), imaging (2%), and laboratory studies (1%). A multivariable linear regression analysis showed that total cost was significantly affected by the number of levels operated on, operating room time, and length of stay. Costs also showed a narrow distribution with few outliers and did not vary significantly over time. These results suggest that facility utilization and supplies/implants are the predominant cost contributors, accounting for 76% of the total cost of ACDF procedures. Efforts at lowering costs within these categories should make the most impact on providing more cost-effective care.
Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M
2017-04-01
A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.
Zhou, Qing-he; Zhu, Bo; Wei, Chang-na; Yan, Min
2016-03-24
Studies have shown that abdominal girth and vertebral column length have high predictive value for spinal spread after administering a dose of plain bupivacaine. we designed a study to identify the specific correlations between abdominal girth, vertebral column length and a 0.5% dosage of plain bupivacaine, which should provide a minimum upper block level (T12) and a suitable upper block level (T10) for lower limb surgeries. A suitable dose of 0.5% plain bupivacaine was administered intrathecally between the L3 and L4 vertebrae for lower limb surgeries. If the upper cephalad spread of the patient by loss of pinprick discrimination was T12 or T10, the patient was enrolled in this study. Five patient variables and intrathecal plain bupivacaine dose were recorded. Linear regression and multiple regression analyses were performed. Totals of 111 patients and 121 patients who lost pinprick discrimination at T12 and T10, respectively, were analyzed in this study. Linear regression analysis showed that only abdominal girth and plain bupivacaine dose were strongly correlated (r =-0.827 for T12, r = -0.806 for T10; both p < 0.0001). Multiple linear regression analysis showed that both abdominal girth and vertebral column length were the key determinants of plain bupivacaine dose (both p < 0.0001). R(2) was 0.874 and 0.860 for the loss of pinprick discrimination at T12 and T10, respectively. Our data indicated that vertebral column length and abdominal girth were strongly correlated with the dosage of intrathecal plain bupivacaine for the loss of pinprick discrimination at T12 and T10. The two regression equations were YT12 = 3.547 + 0.045X1-0.044X2 and YT10 = 3.848 + 0.047X1- 0.046X2 (Y, 0.5% plain bupivacaine volume; X1, vertebral column length;and X 2, abdominal girth), which can accurately predict the minimum and suitable intrathecal bupivacaine dose for lower limb surgery to a great extent, separately.
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure
Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith
2017-01-01
Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
Calculating stage duration statistics in multistage diseases.
Komarova, Natalia L; Thalhauser, Craig J
2011-01-01
Many human diseases are characterized by multiple stages of progression. While the typical sequence of disease progression can be identified, there may be large individual variations among patients. Identifying mean stage durations and their variations is critical for statistical hypothesis testing needed to determine if treatment is having a significant effect on the progression, or if a new therapy is showing a delay of progression through a multistage disease. In this paper we focus on two methods for extracting stage duration statistics from longitudinal datasets: an extension of the linear regression technique, and a counting algorithm. Both are non-iterative, non-parametric and computationally cheap methods, which makes them invaluable tools for studying the epidemiology of diseases, with a goal of identifying different patterns of progression by using bioinformatics methodologies. Here we show that the regression method performs well for calculating the mean stage durations under a wide variety of assumptions, however, its generalization to variance calculations fails under realistic assumptions about the data collection procedure. On the other hand, the counting method yields reliable estimations for both means and variances of stage durations. Applications to Alzheimer disease progression are discussed.
Can change in high-density lipoprotein cholesterol levels reduce cardiovascular risk?
Dean, Bonnie B; Borenstein, Jeff E; Henning, James M; Knight, Kevin; Merz, C Noel Bairey
2004-06-01
The cardiovascular risk reduction observed in many trials of lipid-lowering agents is greater than expected on the basis of observed low-density lipoprotein cholesterol (LDL-C) level reductions. Our objective was to explore the degree to which high-density lipoprotein cholesterol (HDL-C) level changes explain cardiovascular risk reduction. A systematic review identified trials of lipid-lowering agents reporting changes in HDL-C and LDL-C levels and the incidence of coronary heart disease (CHD). The observed relative risk reduction (RRR) in CHD morbidity and mortality rates was calculated. The expected RRR, given the treatment effect on total cholesterol level, was calculated for each trial with logistic regression coefficients from observational studies. The difference between observed and expected RRR was plotted against the change in HDL-C level, and a least-squares regression line was calculated. Fifty-one trials were identified. Nineteen statin trials addressed the association of HDL-C with CHD. Limited numbers of trials of other therapies precluded additional analyses. Among statin trials, therapy reduced total cholesterol levels as much as 32% and LDL-C levels as much as 45%. HDL-C level increases were <10%. Treatment effect on HDL-C levels was not a significant linear predictor of the difference in observed and expected CHD mortality rates, although we observed a trend in this direction (P =.08). Similarly, HDL-C effect was not a significant linear predictor of the difference between observed and expected RRRs for CHD morbidity (P =.20). Although a linear trend toward greater risk reduction was observed with greater effects on HDL-C, differences were not statistically significant. The narrow range of HDL-C level increases in the statin trials likely reduced our ability to detect a beneficial HDL-C effect, if present.
Comparison of buried sand ridges and regressive sand ridges on the outer shelf of the East China Sea
NASA Astrophysics Data System (ADS)
Wu, Ziyin; Jin, Xianglong; Zhou, Jieqiong; Zhao, Dineng; Shang, Jihong; Li, Shoujun; Cao, Zhenyi; Liang, Yuyang
2017-06-01
Based on multi-beam echo soundings and high-resolution single-channel seismic profiles, linear sand ridges in U14 and U2 on the East China Sea (ECS) shelf are identified and compared in detail. Linear sand ridges in U14 are buried sand ridges, which are 90 m below the seafloor. It is presumed that these buried sand ridges belong to the transgressive systems tract (TST) formed 320-200 ka ago and that their top interface is the maximal flooding surface (MFS). Linear sand ridges in U2 are regressive sand ridges. It is presumed that these buried sand ridges belong to the TST of the last glacial maximum (LGM) and that their top interface is the MFS of the LGM. Four sub-stage sand ridges of U2 are discerned from the high-resolution single-channel seismic profile and four strikes of regressive sand ridges are distinguished from the submarine topographic map based on the multi-beam echo soundings. These multi-stage and multi-strike linear sand ridges are the response of, and evidence for, the evolution of submarine topography with respect to sea-level fluctuations since the LGM. Although the difference in the age of formation between U14 and U2 is 200 ka and their sequences are 90 m apart, the general strikes of the sand ridges are similar. This indicates that the basic configuration of tidal waves on the ECS shelf has been stable for the last 200 ka. A basic evolutionary model of the strata of the ECS shelf is proposed, in which sea-level change is the controlling factor. During the sea-level change of about 100 ka, five to six strata are developed and the sand ridges develop in the TST. A similar story of the evolution of paleo-topography on the ECS shelf has been repeated during the last 300 ka.
Changes in Clavicle Length and Maturation in Americans: 1840-1980.
Langley, Natalie R; Cridlin, Sandra
2016-01-01
Secular changes refer to short-term biological changes ostensibly due to environmental factors. Two well-documented secular trends in many populations are earlier age of menarche and increasing stature. This study synthesizes data on maximum clavicle length and fusion of the medial epiphysis in 1840-1980 American birth cohorts to provide a comprehensive assessment of developmental and morphological change in the clavicle. Clavicles from the Hamann-Todd Human Osteological Collection (n = 354), McKern and Stewart Korean War males (n = 341), Forensic Anthropology Data Bank (n = 1,239), and the McCormick Clavicle Collection (n = 1,137) were used in the analysis. Transition analysis was used to evaluate fusion of the medial epiphysis (scored as unfused, fusing, or fused). Several statistical treatments were used to assess fluctuations in maximum clavicle length. First, Durbin-Watson tests were used to evaluate autocorrelation, and a local regression (LOESS) was used to identify visual shifts in the regression slope. Next, piecewise regression was used to fit linear regression models before and after the estimated breakpoints. Multiple starting parameters were tested in the range determined to contain the breakpoint, and the model with the smallest mean squared error was chosen as the best fit. The parameters from the best-fit models were then used to derive the piecewise models, which were compared with the initial simple linear regression models to determine which model provided the best fit for the secular change data. The epiphyseal union data indicate a decline in the age at onset of fusion since the early twentieth century. Fusion commences approximately four years earlier in mid- to late twentieth-century birth cohorts than in late nineteenth- and early twentieth-century birth cohorts. However, fusion is completed at roughly the same age across cohorts. The most significant decline in age at onset of epiphyseal union appears to have occurred since the mid-twentieth century. LOESS plots show a breakpoint in the clavicle length data around the mid-twentieth century in both sexes, and piecewise regression models indicate a significant decrease in clavicle length in the American population after 1940. The piecewise model provides a slightly better fit than the simple linear model. Since the model standard error is not substantially different from the piecewise model, an argument could be made to select the less complex linear model. However, we chose the piecewise model to detect changes in clavicle length that are overfitted with a linear model. The decrease in maximum clavicle length is in line with a documented narrowing of the American skeletal form, as shown by analyses of cranial and facial breadth and bi-iliac breadth of the pelvis. Environmental influences on skeletal form include increases in body mass index, health improvements, improved socioeconomic status, and elimination of infectious diseases. Secular changes in bony dimensions and skeletal maturation stipulate that medical and forensic standards used to deduce information about growth, health, and biological traits must be derived from modern populations.
NASA Technical Reports Server (NTRS)
Clark, P. E.; Andre, C. G.; Adler, I.; Weidner, J.; Podwysocki, M.
1976-01-01
The positive correlation between Al/Si X-ray fluorescence intensity ratios determined during the Apollo 15 lunar mission and a broad-spectrum visible albedo of the moon is quantitatively established. Linear regression analysis performed on 246 1 degree geographic cells of X-ray fluorescence intensity and visible albedo data points produced a statistically significant correlation coefficient of .78. Three distinct distributions of data were identified as (1) within one standard deviation of the regression line, (2) greater than one standard deviation below the line, and (3) greater than one standard deviation above the line. The latter two distributions of data were found to occupy distinct geographic areas in the Palus Somni region.
1994-09-01
Institute of Technology, Wright- Patterson AFB OH, January 1994. 4. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 5...Technology, Wright-Patterson AFB OH 5 April 1994. 29. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 30. Office of
An Evaluation of the Automated Cost Estimating Integrated Tools (ACEIT) System
1989-09-01
residual and it is described as the residual divided by its standard deviation (13:App A,17). Neter, Wasserman, and Kutner, in Applied Linear Regression Models...others. Applied Linear Regression Models. Homewood IL: Irwin, 1983. 19. Raduchel, William J. "A Professional’s Perspective on User-Friendliness," Byte
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Conjoint Analysis: A Study of the Effects of Using Person Variables.
ERIC Educational Resources Information Center
Fraas, John W.; Newman, Isadore
Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…
Fitting program for linear regressions according to Mahon (1996)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trappitsch, Reto G.
2018-01-09
This program takes the users' Input data and fits a linear regression to it using the prescription presented by Mahon (1996). Compared to the commonly used York fit, this method has the correct prescription for measurement error propagation. This software should facilitate the proper fitting of measurements with a simple Interface.
How Robust Is Linear Regression with Dummy Variables?
ERIC Educational Resources Information Center
Blankmeyer, Eric
2006-01-01
Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…
Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method
ERIC Educational Resources Information Center
Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev
2018-01-01
The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…
ERIC Educational Resources Information Center
Thompson, Russel L.
Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are…
On the null distribution of Bayes factors in linear regression
USDA-ARS?s Scientific Manuscript database
We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...
Common pitfalls in statistical analysis: Linear regression analysis
Aggarwal, Rakesh; Ranganathan, Priya
2017-01-01
In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis. PMID:28447022
Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.
Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo
2015-08-01
Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.
NASA Astrophysics Data System (ADS)
Chakraborty, Joheen; Banerji, Sugata
2018-03-01
Driven by a desire to control climate change and reduce the dependence on fossil fuels, governments around the world are increasing the adoption of renewable energy sources. However, among the US states, we observe a wide disparity in renewable penetration. In this study, we have identified and cleaned over a dozen datasets representing solar energy penetration in each US state, and the potentially relevant socioeconomic and other factors that may be driving the growth in solar. We have applied a number of predictive modeling approaches - including machine learning and regression - on these datasets over a 17-year period and evaluated the relative performance of the models. Our goals were: (1) identify the most important factors that are driving the growth in solar, (2) choose the most effective predictive modeling technique for solar growth, and (3) develop a model for predicting next year’s solar growth using this year’s data. We obtained very promising results with random forests (about 90% efficacy) and varying degrees of success with support vector machines and regression techniques (linear, polynomial, ridge). We also identified states with solar growth slower than expected and representing a potential for stronger growth in future.
Construction and analysis of a modular model of caspase activation in apoptosis
Harrington, Heather A; Ho, Kenneth L; Ghosh, Samik; Tung, KC
2008-01-01
Background A key physiological mechanism employed by multicellular organisms is apoptosis, or programmed cell death. Apoptosis is triggered by the activation of caspases in response to both extracellular (extrinsic) and intracellular (intrinsic) signals. The extrinsic and intrinsic pathways are characterized by the formation of the death-inducing signaling complex (DISC) and the apoptosome, respectively; both the DISC and the apoptosome are oligomers with complex formation dynamics. Additionally, the extrinsic and intrinsic pathways are coupled through the mitochondrial apoptosis-induced channel via the Bcl-2 family of proteins. Results A model of caspase activation is constructed and analyzed. The apoptosis signaling network is simplified through modularization methodologies and equilibrium abstractions for three functional modules. The mathematical model is composed of a system of ordinary differential equations which is numerically solved. Multiple linear regression analysis investigates the role of each module and reduced models are constructed to identify key contributions of the extrinsic and intrinsic pathways in triggering apoptosis for different cell lines. Conclusion Through linear regression techniques, we identified the feedbacks, dissociation of complexes, and negative regulators as the key components in apoptosis. The analysis and reduced models for our model formulation reveal that the chosen cell lines predominately exhibit strong extrinsic caspase, typical of type I cell, behavior. Furthermore, under the simplified model framework, the selected cells lines exhibit different modes by which caspase activation may occur. Finally the proposed modularized model of apoptosis may generalize behavior for additional cells and tissues, specifically identifying and predicting components responsible for the transition from type I to type II cell behavior. PMID:19077196
Gressel, Gregory M; Van Arsdale, Anne; Dioun, Shayan M; Goldberg, Gary L; Nevadunsky, Nicole S
2017-05-01
The application and interview process for gynecologic oncology fellowship is highly competitive, time-consuming and expensive for applicants. We conducted a survey of successfully matched gynecologic oncology fellowship applicants to assess problems associated with the interview process and identify areas for improvement. All Society of Gynecologic Oncology (SGO) list-serve members who have participated in the match program for gynecologic oncology fellowship were asked to complete an online survey regarding the interview process. Linear regression modeling was used to examine association between year of match, number of programs applied to, cost incurred, and overall satisfaction. Two hundred and sixty-nine eligible participants reported applying to a mean of 20 programs [range 1-45] and were offered a mean of 14 interviews [range 1-43]. They spent an average of $6000 [$0-25,000], using personal savings (54%), credit cards (50%), family support (12%) or personal loans (3%). Seventy percent of respondents identified the match as fair, and 93% were satisfied. Interviewees spent a mean of 15 [0-45] days away from work and 37% reported difficulty arranging coverage. Linear regression showed an increase in number of programs applied to and cost per applicant over time ( p < 0.001) between 1993 and 2016. Applicants who applied to all available programs spent more ( p < 0.001) than those who applied to programs based on their location or quality. The current fellowship match was identified as fair and satisfying by most respondents despite being time consuming and expensive. Suggested alternative options included clustering interviews geographically or conducting preliminary interviews at the SGO Annual Meeting.
Liu, Guorui; Cai, Zongwei; Zheng, Minghui; Jiang, Xiaoxu; Nie, Zhiqiang; Wang, Mei
2015-01-01
Identifying marker congeners of unintentionally produced polychlorinated naphthalenes (PCNs) from industrial thermal sources might be useful for predicting total PCN (∑2-8PCN) emissions by the determination of only indicator congeners. In this study, potential indicator congeners were identified based on the PCN data in 122 stack gas samples from over 60 plants involved in more than ten industrial thermal sources reported in our previous case studies. Linear regression analyses identified that the concentrations of CN27/30, CN52/60, and CN66/67 correlated significantly with ∑2-8PCN (R(2)=0.77, 0.80, and 0.58, respectively; n=122, p<0.05), which might be good candidates for indicator congeners. Equations describing relationships between indicators and ∑2-8PCN were established. The linear regression analyses involving 122 samples showed that the relationships between the indicator congeners and ∑2-8PCN were not significantly affected by factors such as industry types, raw materials used, or operating conditions. Hierarchical cluster analysis and similarity calculations for the 122 stack gas samples were adopted to group those samples and evaluating their similarity and difference based on the PCN homolog distributions from different industrial thermal sources. Generally, the fractions of less chlorinated homologs comprised of di-, tri-, and tetra-homologs were much higher than that of more chlorinated homologs for up to 111 stack gas samples contained in group 1 and 2, which indicating the dominance of lower chlorinated homologs in stack gas from industrial thermal sources. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Wu, Cheng; Zhen Yu, Jian
2018-03-01
Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Wavelet regression model in forecasting crude oil price
NASA Astrophysics Data System (ADS)
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-01-01
This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
Bai, Lu; Chan, Ching-Yao; Liu, Pan; Xu, Chengcheng
2017-10-03
Electric bikes (e-bikes) have been one of the fastest growing trip modes in Southeast Asia over the past 2 decades. The increasing popularity of e-bikes raised some safety concerns regarding urban transport systems. The primary objective of this study was to identify whether and how the generalized linear regression model (GLM) could be used to relate cyclists' safety with various contributing factors when riding in a mid-block bike lane. The types of 2-wheeled vehicles in the study included bicycle-style electric bicycles (BSEBs), scooter-style electric bicycles (SSEBs), and regular bicycles (RBs). Traffic conflict technology was applied as a surrogate measure to evaluate the safety of 2-wheeled vehicles. The safety performance model was developed by adopting a generalized linear regression model for relating the frequency of rear-end conflicts between e-bikes and regular bikes to the operating speeds of BSEBs, SSEBs, and RBs in mid-block bike lanes. The frequency of rear-end conflicts between e-bikes and bikes increased with an increase in the operating speeds of e-bikes and the volume of e-bikes and bikes and decreased with an increase in the width of bike lanes. The large speed difference between e-bikes and bikes increased the frequency of rear-end conflicts between e-bikes and bikes in mid-block bike lanes. A 1% increase in the average operating speed of e-bikes would increase the expected number of rear-end conflicts between e-bikes and bikes by 1.48%. A 1% increase in the speed difference between e-bikes and bikes would increase the expected number of rear-end conflicts between e-bikes/bikes by 0.16%. The conflict frequency in mid-block bike lanes can be modeled using generalized linear regression models. The factors that significantly affected the frequency of rear-end conflicts included the operating speeds of e-bikes, the speed difference between e-bikes and regular bikes, the volume of e-bikes, the volume of bikes, and the width of bike lanes. The safety performance model can help better understand the causes of crash occurrences in mid-block bike lanes.
Optimizing complex phenotypes through model-guided multiplex genome engineering
Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.; ...
2017-05-25
Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.
Optimizing complex phenotypes through model-guided multiplex genome engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.
Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.
Yousefi, Siamak; Balasubramanian, Madhusudhanan; Goldbaum, Michael H; Medeiros, Felipe A; Zangwill, Linda M; Weinreb, Robert N; Liebmann, Jeffrey M; Girkin, Christopher A; Bowd, Christopher
2016-05-01
To validate Gaussian mixture-model with expectation maximization (GEM) and variational Bayesian independent component analysis mixture-models (VIM) for detecting glaucomatous progression along visual field (VF) defect patterns (GEM-progression of patterns (POP) and VIM-POP). To compare GEM-POP and VIM-POP with other methods. GEM and VIM models separated cross-sectional abnormal VFs from 859 eyes and normal VFs from 1117 eyes into abnormal and normal clusters. Clusters were decomposed into independent axes. The confidence limit (CL) of stability was established for each axis with a set of 84 stable eyes. Sensitivity for detecting progression was assessed in a sample of 83 eyes with known progressive glaucomatous optic neuropathy (PGON). Eyes were classified as progressed if any defect pattern progressed beyond the CL of stability. Performance of GEM-POP and VIM-POP was compared to point-wise linear regression (PLR), permutation analysis of PLR (PoPLR), and linear regression (LR) of mean deviation (MD), and visual field index (VFI). Sensitivity and specificity for detecting glaucomatous VFs were 89.9% and 93.8%, respectively, for GEM and 93.0% and 97.0%, respectively, for VIM. Receiver operating characteristic (ROC) curve areas for classifying progressed eyes were 0.82 for VIM-POP, 0.86 for GEM-POP, 0.81 for PoPLR, 0.69 for LR of MD, and 0.76 for LR of VFI. GEM-POP was significantly more sensitive to PGON than PoPLR and linear regression of MD and VFI in our sample, while providing localized progression information. Detection of glaucomatous progression can be improved by assessing longitudinal changes in localized patterns of glaucomatous defect identified by unsupervised machine learning.
Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David
2017-10-01
Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P < .001). Means of software machine-derived values differed significantly from actual PLT yield, 4.72 × 10 11 vs.6.12 × 10 11 , respectively, (P < .001). The following equation was developed to adjust these values: actual PLT yield= 0.221 + (1.254 × theoretical platelet yield). ROC curve model showed an optimal apheresis device software prediction cut-off of 4.65 × 10 11 to obtain a DP, with a sensitivity of 82.2%, specificity of 93.3%, and an area under the curve (AUC) of 0.909. Trima Accel v6.0 software consistently underestimated PLT yields. Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.
Post-processing through linear regression
NASA Astrophysics Data System (ADS)
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Linear regression metamodeling as a tool to summarize and present simulation model results.
Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M
2013-10-01
Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
The association of genetic variants of type 2 diabetes with kidney function.
Franceschini, Nora; Shara, Nawar M; Wang, Hong; Voruganti, V Saroja; Laston, Sandy; Haack, Karin; Lee, Elisa T; Best, Lyle G; Maccluer, Jean W; Cochran, Barbara J; Dyer, Thomas D; Howard, Barbara V; Cole, Shelley A; North, Kari E; Umans, Jason G
2012-07-01
Type 2 diabetes is highly prevalent and is the major cause of progressive chronic kidney disease in American Indians. Genome-wide association studies identified several loci associated with diabetes but their impact on susceptibility to diabetic complications is unknown. We studied the association of 18 type 2 diabetes genome-wide association single-nucleotide polymorphisms (SNPs) with estimated glomerular filtration rate (eGFR; MDRD equation) and urine albumin-to-creatinine ratio in 6958 Strong Heart Study family and cohort participants. Center-specific residuals of eGFR and log urine albumin-to-creatinine ratio, obtained from linear regression models adjusted for age, sex, and body mass index, were regressed onto SNP dosage using variance component models in family data and linear regression in unrelated individuals. Estimates were then combined across centers. Four diabetic loci were associated with eGFR and one locus with urine albumin-to-creatinine ratio. A SNP in the WFS1 gene (rs10010131) was associated with higher eGFR in younger individuals and with increased albuminuria. SNPs in the FTO, KCNJ11, and TCF7L2 genes were associated with lower eGFR, but not albuminuria, and were not significant in prospective analyses. Our findings suggest a shared genetic risk for type 2 diabetes and its kidney complications, and a potential role for WFS1 in early-onset diabetic nephropathy in American Indian populations.
Bokhari, Syed Akhtar H; Khan, Ayyaz A; Butt, Arshad K; Hanif, Mohammad; Izhar, Mateen; Tatakis, Dimitris N; Ashfaq, Mohammad
2014-11-01
Few studies have examined the relationship of individual periodontal parameters with individual systemic biomarkers. This study assessed the possible association between specific clinical parameters of periodontitis and systemic biomarkers of coronary heart disease risk in coronary heart disease patients with periodontitis. Angiographically proven coronary heart disease patients with periodontitis (n = 317), aged >30 years and without other systemic illness were examined. Periodontal clinical parameters of bleeding on probing (BOP), probing depth (PD), and clinical attachment level (CAL) and systemic levels of high-sensitivity C-reactive protein (CRP), fibrinogen (FIB) and white blood cells (WBC) were noted and analyzed to identify associations through linear and stepwise multiple regression analyses. Unadjusted linear regression showed significant associations between periodontal and systemic parameters; the strongest association (r = 0.629; p < 0.001) was found between BOP and CRP levels, the periodontal and systemic inflammation marker, respectively. Stepwise regression analysis models revealed that BOP was a predictor of systemic CRP levels (p < 0.0001). BOP was the only periodontal parameter significantly associated with each systemic parameter (CRP, FIB, and WBC). In coronary heart disease patients with periodontitis, BOP is strongly associated with systemic CRP levels; this association possibly reflects the potential significance of the local periodontal inflammatory burden for systemic inflammation. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Aptel, Florent; Sayous, Romain; Fortoul, Vincent; Beccat, Sylvain; Denis, Philippe
2010-12-01
To evaluate and compare the regional relationships between visual field sensitivity and retinal nerve fiber layer (RNFL) thickness as measured by spectral-domain optical coherence tomography (OCT) and scanning laser polarimetry. Prospective cross-sectional study. One hundred and twenty eyes of 120 patients (40 with healthy eyes, 40 with suspected glaucoma, and 40 with glaucoma) were tested on Cirrus-OCT, GDx VCC, and standard automated perimetry. Raw data on RNFL thickness were extracted for 256 peripapillary sectors of 1.40625 degrees each for the OCT measurement ellipse and 64 peripapillary sectors of 5.625 degrees each for the GDx VCC measurement ellipse. Correlations between peripapillary RNFL thickness in 6 sectors and visual field sensitivity in the 6 corresponding areas were evaluated using linear and logarithmic regression analysis. Receiver operating curve areas were calculated for each instrument. With spectral-domain OCT, the correlations (r(2)) between RNFL thickness and visual field sensitivity ranged from 0.082 (nasal RNFL and corresponding visual field area, linear regression) to 0.726 (supratemporal RNFL and corresponding visual field area, logarithmic regression). By comparison, with GDx-VCC, the correlations ranged from 0.062 (temporal RNFL and corresponding visual field area, linear regression) to 0.362 (supratemporal RNFL and corresponding visual field area, logarithmic regression). In pairwise comparisons, these structure-function correlations were generally stronger with spectral-domain OCT than with GDx VCC and with logarithmic regression than with linear regression. The largest areas under the receiver operating curve were seen for OCT superior thickness (0.963 ± 0.022; P < .001) in eyes with glaucoma and for OCT average thickness (0.888 ± 0.072; P < .001) in eyes with suspected glaucoma. The structure-function relationship was significantly stronger with spectral-domain OCT than with scanning laser polarimetry, and was better expressed logarithmically than linearly. Measurements with these 2 instruments should not be considered to be interchangeable. Copyright © 2010 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Rule, David L.
Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…
Unit Cohesion and the Surface Navy: Does Cohesion Affect Performance
1989-12-01
v. 68, 1968. Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. Rand Corporation R-2607...Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. SAS User’s Guide: Basics, Version 5 ed
1990-03-01
and M.H. Knuter. Applied Linear Regression Models. Homewood IL: Richard D. Erwin Inc., 1983. Pritsker, A. Alan B. Introduction to Simulation and SLAM...Control Variates in Simulation," European Journal of Operational Research, 42: (1989). Neter, J., W. Wasserman, and M.H. Xnuter. Applied Linear Regression Models
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Calibrated Peer Review for Interpreting Linear Regression Parameters: Results from a Graduate Course
ERIC Educational Resources Information Center
Enders, Felicity B.; Jenkins, Sarah; Hoverman, Verna
2010-01-01
Biostatistics is traditionally a difficult subject for students to learn. While the mathematical aspects are challenging, it can also be demanding for students to learn the exact language to use to correctly interpret statistical results. In particular, correctly interpreting the parameters from linear regression is both a vital tool and a…
ERIC Educational Resources Information Center
Richter, Tobias
2006-01-01
Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…
Some Applied Research Concerns Using Multiple Linear Regression Analysis.
ERIC Educational Resources Information Center
Newman, Isadore; Fraas, John W.
The intention of this paper is to provide an overall reference on how a researcher can apply multiple linear regression in order to utilize the advantages that it has to offer. The advantages and some concerns expressed about the technique are examined. A number of practical ways by which researchers can deal with such concerns as…
ERIC Educational Resources Information Center
Nelson, Dean
2009-01-01
Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…
Quantum State Tomography via Linear Regression Estimation
Qi, Bo; Hou, Zhibo; Li, Li; Dong, Daoyi; Xiang, Guoyong; Guo, Guangcan
2013-01-01
A simple yet efficient state reconstruction algorithm of linear regression estimation (LRE) is presented for quantum state tomography. In this method, quantum state reconstruction is converted into a parameter estimation problem of a linear regression model and the least-squares method is employed to estimate the unknown parameters. An asymptotic mean squared error (MSE) upper bound for all possible states to be estimated is given analytically, which depends explicitly upon the involved measurement bases. This analytical MSE upper bound can guide one to choose optimal measurement sets. The computational complexity of LRE is O(d4) where d is the dimension of the quantum state. Numerical examples show that LRE is much faster than maximum-likelihood estimation for quantum state tomography. PMID:24336519
Predicting story goodness performance from cognitive measures following traumatic brain injury.
Lê, Karen; Coelho, Carl; Mozeiko, Jennifer; Krueger, Frank; Grafman, Jordan
2012-05-01
This study examined the prediction of performance on measures of the Story Goodness Index (SGI; Lê, Coelho, Mozeiko, & Grafman, 2011) from executive function (EF) and memory measures following traumatic brain injury (TBI). It was hypothesized that EF and memory measures would significantly predict SGI outcomes. One hundred sixty-seven individuals with TBI participated in the study. Story retellings were analyzed using the SGI protocol. Three cognitive measures--Delis-Kaplan Executive Function System (D-KEFS; Delis, Kaplan, & Kramer, 2001) Sorting Test, Wechsler Memory Scale--Third Edition (WMS-III; Wechsler, 1997) Working Memory Primary Index (WMI), and WMS-III Immediate Memory Primary Index (IMI)--were entered into a multiple linear regression model for each discourse measure. Two sets of regression analyses were performed, the first with the Sorting Test as the first predictor and the second with it as the last. The first set of regression analyses identified the Sorting Test and IMI as the only significant predictors of performance on measures of the SGI. The second set identified all measures as significant predictors when evaluating each step of the regression function. The cognitive variables predicted performance on the SGI measures, although there were differences in the amount of explained variance. The results (a) suggest that storytelling ability draws on a number of underlying skills and (b) underscore the importance of using discrete cognitive tasks rather than broad cognitive indices to investigate the cognitive substrates of discourse.
NASA Astrophysics Data System (ADS)
Underwood, Kristen L.; Rizzo, Donna M.; Schroth, Andrew W.; Dewoolkar, Mandar M.
2017-12-01
Given the variable biogeochemical, physical, and hydrological processes driving fluvial sediment and nutrient export, the water science and management communities need data-driven methods to identify regions prone to production and transport under variable hydrometeorological conditions. We use Bayesian analysis to segment concentration-discharge linear regression models for total suspended solids (TSS) and particulate and dissolved phosphorus (PP, DP) using 22 years of monitoring data from 18 Lake Champlain watersheds. Bayesian inference was leveraged to estimate segmented regression model parameters and identify threshold position. The identified threshold positions demonstrated a considerable range below and above the median discharge—which has been used previously as the default breakpoint in segmented regression models to discern differences between pre and post-threshold export regimes. We then applied a Self-Organizing Map (SOM), which partitioned the watersheds into clusters of TSS, PP, and DP export regimes using watershed characteristics, as well as Bayesian regression intercepts and slopes. A SOM defined two clusters of high-flux basins, one where PP flux was predominantly episodic and hydrologically driven; and another in which the sediment and nutrient sourcing and mobilization were more bimodal, resulting from both hydrologic processes at post-threshold discharges and reactive processes (e.g., nutrient cycling or lateral/vertical exchanges of fine sediment) at prethreshold discharges. A separate DP SOM defined two high-flux clusters exhibiting a bimodal concentration-discharge response, but driven by differing land use. Our novel framework shows promise as a tool with broad management application that provides insights into landscape drivers of riverine solute and sediment export.
Applications of statistics to medical science, III. Correlation and regression.
Watanabe, Hiroshi
2012-01-01
In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.
Shared Decision-Making among Caregivers and Health Care Providers of Youth with Type 1 Diabetes
Valenzuela, Jessica M.; Smith, Laura B.; Stafford, Jeanette M.; Andrews, S.; D’Agostino, Ralph B.; Lawrence, Jean M.; Yi-Frazier, Joyce P.; Seid, Michael; Dolan, Lawrence M.
2014-01-01
The present study aimed to examine perceptions of shared decision-making (SDM) in caregivers of youth with type 1 diabetes (T1D). Interview, survey data, and HbA1c assays were gathered from caregivers of 439 youth with T1D aged 3–18 years. Caregiver-report indicated high perceived SDM during medical visits. Multivariable linear regression indicated that greater SDM is associated with lower HbA1c, older child age, and having a pediatric endocrinologist provider. Multiple logistic regression found that caregivers who did not perceive having made any healthcare decisions in the past year were more likely to identify a non-pediatric endocrinologist provider and to report less optimal diabetes self-care. Findings suggest that youth whose caregivers report greater SDM may show benefits in terms of self-care and glycemic control. Future research should examine the role of youth in SDM and how best to identify youth and families with low SDM in order to improve care. PMID:24952739
Duvall, Susanne W.; Erickson, Sarah J.; MacLean, Peggy; Lowe, Jean R.
2014-01-01
The goal was to identify perinatal predictors of early executive dysfunction in preschoolers born very low birth weight. Fifty-seven preschoolers completed three executive function tasks (Dimensional Change Card Sort-Separated (inhibition, working memory and cognitive flexibility), Bear Dragon (inhibition and working memory) and Gift Delay Open (inhibition)). Relationships between executive function and perinatal medical severity factors (gestational age, days on ventilation, size for gestational age, maternal steroids and number of surgeries), and chronological age were investigated by multiple linear regression and logistic regression. Different perinatal medical severity factors were predictive of executive function tasks, with gestational age predicting Bear Dragon and Gift Open; and number of surgeries and maternal steroids predicting performance on Dimensional Change Card Sort-Separated. By understanding the relationship between perinatal medical severity factors and preschool executive outcomes, we may be able to identify children at highest risk for future executive dysfunction, thereby focusing targeted early intervention services. PMID:25117418
Ma, Wan-Li; Sun, De-Zhi; Shen, Wei-Guo; Yang, Meng; Qi, Hong; Liu, Li-Yan; Shen, Ji-Min; Li, Yi-Fan
2011-07-01
A comprehensive sampling campaign was carried out to study atmospheric concentration of polycyclic aromatic hydrocarbons (PAHs) in Beijing and to evaluate the effectiveness of source control strategies in reducing PAHs pollution after the 29th Olympic Games. The sub-cooled liquid vapor pressure (logP(L)(o))-based model and octanol-air partition coefficient (K(oa))-based model were applied based on each seasonal dateset. Regression analysis among log K(P), logP(L)(o) and log K(oa) exhibited high significant correlations for four seasons. Source factors were identified by principle component analysis and contributions were further estimated by multiple linear regression. Pyrogenic sources and coke oven emission were identified as major sources for both the non-heating and heating seasons. As compared with literatures, the mean PAH concentrations before and after the 29th Olympic Games were reduced by more than 60%, indicating that the source control measures were effective for reducing PAHs pollution in Beijing. Copyright © 2011 Elsevier Ltd. All rights reserved.
Fonseca-Machado, Mariana de Oliveira; Monteiro, Juliana Cristina dos Santos; Haas, Vanderlei José; Abrão, Ana Cristina Freitas de Vilhena; Gomes-Sponholz, Flávia
2015-01-01
Objective: to identify the relationship between posttraumatic stress disorder, trait and state anxiety, and intimate partner violence during pregnancy. Method: observational, cross-sectional study developed with 358 pregnant women. The Posttraumatic Stress Disorder Checklist - Civilian Version was used, as well as the State-Trait Anxiety Inventory and an adapted version of the instrument used in the World Health Organization Multi-country Study on Women's Health and Domestic Violence. Results: after adjusting to the multiple logistic regression model, intimate partner violence, occurred during pregnancy, was associated with the indication of posttraumatic stress disorder. The adjusted multiple linear regression models showed that the victims of violence, in the current pregnancy, had higher symptom scores of trait and state anxiety than non-victims. Conclusion: recognizing the intimate partner violence as a clinically relevant and identifiable risk factor for the occurrence of anxiety disorders during pregnancy can be a first step in the prevention thereof. PMID:26487135
Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway
Wang, Fengfeng; Wong, S. C. Cesar; Chan, Lawrence W. C.; Cho, William C. S.; Yip, S. P.; Yung, Benjamin Y. M.
2014-01-01
Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC. PMID:24895601
Stratospheric Ozone Trends and Variability as Seen by SCIAMACHY from 2002 to 2012
NASA Technical Reports Server (NTRS)
Gebhardt, C.; Rozanov, A.; Hommel, R.; Weber, M.; Bovensmann, H.; Burrows, J. P.; Degenstein, D.; Froidevaux, L.; Thompson, A. M.
2014-01-01
Vertical profiles of the rate of linear change (trend) in the altitude range 15-50 km are determined from decadal O3 time series obtained from SCIAMACHY/ENVISAT measurements in limb-viewing geometry. The trends are calculated by using a multivariate linear regression. Seasonal variations, the quasi-biennial oscillation, signatures of the solar cycle and the El Nino-Southern Oscillation are accounted for in the regression. The time range of trend calculation is August 2002-April 2012. A focus for analysis are the zonal bands of 20 deg N - 20 deg S (tropics), 60 - 50 deg N, and 50 - 60 deg S (midlatitudes). In the tropics, positive trends of up to 5% per decade between 20 and 30 km and negative trends of up to 10% per decade between 30 and 38 km are identified. Positive O3 trends of around 5% per decade are found in the upper stratosphere in the tropics and at midlatitudes. Comparisons between SCIAMACHY and EOS MLS show reasonable agreement both in the tropics and at midlatitudes for most altitudes. In the tropics, measurements from OSIRIS/Odin and SHADOZ are also analysed. These yield rates of linear change of O3 similar to those from SCIAMACHY. However, the trends from SCIAMACHY near 34 km in the tropics are larger than MLS and OSIRIS by a factor of around two.
Pre-natal exposures to cocaine and alcohol and physical growth patterns to age 8 years
Lumeng, Julie C.; Cabral, Howard J.; Gannon, Katherine; Heeren, Timothy; Frank, Deborah A.
2007-01-01
Two hundred and two primarily African American/Caribbean children (classified by maternal report and infant meconium as 38 heavier, 74 lighter and 89 not cocaine-exposed) were measured repeatedly from birth to age 8 years to assess whether there is an independent effect of prenatal cocaine exposure on physical growth patterns. Children with fetal alcohol syndrome identifiable at birth were excluded. At birth, cocaine and alcohol exposures were significantly and independently associated with lower weight, length and head circumference in cross-sectional multiple regression analyses. The relationship over time of pre-natal exposures to weight, height, and head circumference was then examined by multiple linear regression using mixed linear models including covariates: child’s gestational age, gender, ethnicity, age at assessment, current caregiver, birth mother’s use of alcohol, marijuana and tobacco during the pregnancy and pre-pregnancy weight (for child’s weight) and height (for child’s height and head circumference). The cocaine effects did not persist beyond infancy in piecewise linear mixed models, but a significant and independent negative effect of pre-natal alcohol exposure persisted for weight, height, and head circumference. Catch-up growth in cocaine-exposed infants occurred primarily by 6 months of age for all growth parameters, with some small fluctuations in growth rates in the preschool age range but no detectable differences between heavier versus unexposed nor lighter versus unexposed thereafter. PMID:17412558
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.
Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S
2017-06-01
The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
Computer Mapping of Water Quality in Saginaw Bay with LANDSAT Digital Data
NASA Technical Reports Server (NTRS)
Rogers, R. H. (Principal Investigator); Shah, N. J.; Smith, V. E.; Mckeon, J. B.
1976-01-01
The author has identified the following significant results. LANDSAT digital data and ground truth measurements for Saginaw Bay (Lake Huron), Michigan, for 31 July 1975 were correlated by stepwise linear regression and the resulting equations used to estimate invisible water quality parameters in nonsampled areas. Chloride, conductivity, total Kjeldahl nitrogen, total phosphorus, and chlorophyll a were best correlated with the ratio of LANDSAT Band 4 to Band 5. Temperature and Secchi depth correlate best with Band 5.
Regression of non-linear coupling of noise in LIGO detectors
NASA Astrophysics Data System (ADS)
Da Silva Costa, C. F.; Billman, C.; Effler, A.; Klimenko, S.; Cheng, H.-P.
2018-03-01
In 2015, after their upgrade, the advanced Laser Interferometer Gravitational-Wave Observatory (LIGO) detectors started acquiring data. The effort to improve their sensitivity has never stopped since then. The goal to achieve design sensitivity is challenging. Environmental and instrumental noise couple to the detector output with different, linear and non-linear, coupling mechanisms. The noise regression method we use is based on the Wiener–Kolmogorov filter, which uses witness channels to make noise predictions. We present here how this method helped to determine complex non-linear noise couplings in the output mode cleaner and in the mirror suspension system of the LIGO detector.
Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan
2012-12-01
A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Henneghan, Ashley M; Palesh, Oxana; Harrison, Michelle; Kesler, Shelli R
2018-07-15
The purpose of this study is to explore 13 cytokine predictors of chemotherapy-related cognitive impairment (CRCI) in breast cancer survivors (BCS) 6 months to 10 years after chemotherapy completion using a multivariate, non-parametric approach. Cross sectional data collection included completion of a survey, cognitive testing, and non-fasting blood from 66 participants. Data were analyzed using random forest regression to identify the most significant predictors for each of the cognitive test scores. A different cytokine profile predicted each cognitive test. Adjusted R 2 for each model ranged from 0.71-0.77 (p's < 9.50 -10 ). The relationships between all the cytokine predictors and cognitive test scores were non-linear. Our findings are unique to the field of CRCI and suggest non-linear cytokine specificity to neural networks underlying cognitive functions assessed in this study. Copyright © 2018 Elsevier B.V. All rights reserved.
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
Marcotte, Thomas D.; Deutsch, Reena; Michael, Benedict Daniel; Franklin, Donald; Cookson, Debra Rosario; Bharti, Ajay R.; Grant, Igor; Letendre, Scott L.
2013-01-01
Background Neurocognitive (NC) impairment (NCI) occurs commonly in people living with HIV. Despite substantial effort, no biomarkers have been sufficiently validated for diagnosis and prognosis of NCI in the clinic. The goal of this project was to identify diagnostic or prognostic biomarkers for NCI in a comprehensively characterized HIV cohort. Methods Multidisciplinary case review selected 98 HIV-infected individuals and categorized them into four NC groups using normative data: stably normal (SN), stably impaired (SI), worsening (Wo), or improving (Im). All subjects underwent comprehensive NC testing, phlebotomy, and lumbar puncture at two timepoints separated by a median of 6.2 months. Eight biomarkers were measured in CSF and blood by immunoassay. Results were analyzed using mixed model linear regression and staged recursive partitioning. Results At the first visit, subjects were mostly middle-aged (median 45) white (58%) men (84%) who had AIDS (70%). Of the 73% who took antiretroviral therapy (ART), 54% had HIV RNA levels below 50 c/mL in plasma. Mixed model linear regression identified that only MCP-1 in CSF was associated with neurocognitive change group. Recursive partitioning models aimed at diagnosis (i.e., correctly classifying neurocognitive status at the first visit) were complex and required most biomarkers to achieve misclassification limits. In contrast, prognostic models were more efficient. A combination of three biomarkers (sCD14, MCP-1, SDF-1α) correctly classified 82% of Wo and SN subjects, including 88% of SN subjects. A combination of two biomarkers (MCP-1, TNF-α) correctly classified 81% of Im and SI subjects, including 100% of SI subjects. Conclusions This analysis of well-characterized individuals identified concise panels of biomarkers associated with NC change. Across all analyses, the two most frequently identified biomarkers were sCD14 and MCP-1, indicators of monocyte/macrophage activation. While the panels differed depending on the outcome and on the degree of misclassification, nearly all stable patients were correctly classified. PMID:24101401
Predictive and mechanistic multivariate linear regression models for reaction development
Santiago, Celine B.; Guo, Jing-Yao
2018-01-01
Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711
Adding a Parameter Increases the Variance of an Estimated Regression Function
ERIC Educational Resources Information Center
Withers, Christopher S.; Nadarajah, Saralees
2011-01-01
The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…
Using nonlinear quantile regression to estimate the self-thinning boundary curve
Quang V. Cao; Thomas J. Dean
2015-01-01
The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...
Simultaneous spectrophotometric determination of salbutamol and bromhexine in tablets.
Habib, I H I; Hassouna, M E M; Zaki, G A
2005-03-01
Typical anti-mucolytic drugs called salbutamol hydrochloride and bromhexine sulfate encountered in tablets were determined simultaneously either by using linear regression at zero-crossing wavelengths of the first derivation of UV-spectra or by application of multiple linear partial least squares regression method. The results obtained by the two proposed mathematical methods were compared with those obtained by the HPLC technique.
Laurens, L M L; Wolfrum, E J
2013-12-18
One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Image interpolation via regularized local linear regression.
Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang
2011-12-01
The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE
Newton-Howes, Giles
2014-02-01
The aim of this study was to assess the degree to which mental state disorder and personality disorder impact on social functioning in patients engaged in secondary mental health care in New Zealand. Patients were interviewed using peer-reviewed instruments able to provide an indication of severity to assess their social functioning, personality status and diagnosis. Univariate correlations and linear regression was used to identify the association between social functioning, mental state disorder and personality. Using simple correlations all diagnostic categories associated with declines in social functioning. In the regression analysis depression and personality dysfunction accounted for 48% of the variance in social functioning. For patients engaged in secondary care, depression and personality dysfunction are significantly associated with poorer social functioning.
Applied Multiple Linear Regression: A General Research Strategy
ERIC Educational Resources Information Center
Smith, Brandon B.
1969-01-01
Illustrates some of the basic concepts and procedures for using regression analysis in experimental design, analysis of variance, analysis of covariance, and curvilinear regression. Applications to evaluation of instruction and vocational education programs are illustrated. (GR)
Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma
2016-01-01
Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens. PMID:27651666
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Bennett, Bradley C; Husby, Chad E
2008-03-28
Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
An Application to the Prediction of LOD Change Based on General Regression Neural Network
NASA Astrophysics Data System (ADS)
Zhang, X. H.; Wang, Q. J.; Zhu, J. J.; Zhang, H.
2011-07-01
Traditional prediction of the LOD (length of day) change was based on linear models, such as the least square model and the autoregressive technique, etc. Due to the complex non-linear features of the LOD variation, the performances of the linear model predictors are not fully satisfactory. This paper applies a non-linear neural network - general regression neural network (GRNN) model to forecast the LOD change, and the results are analyzed and compared with those obtained with the back propagation neural network and other models. The comparison shows that the performance of the GRNN model in the prediction of the LOD change is efficient and feasible.
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale. PMID:26964095
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0-20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale.
DOT National Transportation Integrated Search
2016-09-01
We consider the problem of solving mixed random linear equations with k components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample...
Huang, Hairong; Xu, Zanzan; Shao, Xianhong; Wismeijer, Daniel; Sun, Ping; Wang, Jingxiao
2017-01-01
Objectives This study identified potential general influencing factors for a mathematical prediction of implant stability quotient (ISQ) values in clinical practice. Methods We collected the ISQ values of 557 implants from 2 different brands (SICace and Osstem) placed by 2 surgeons in 336 patients. Surgeon 1 placed 329 SICace implants, and surgeon 2 placed 113 SICace implants and 115 Osstem implants. ISQ measurements were taken at T1 (immediately after implant placement) and T2 (before dental restoration). A multivariate linear regression model was used to analyze the influence of the following 11 candidate factors for stability prediction: sex, age, maxillary/mandibular location, bone type, immediate/delayed implantation, bone grafting, insertion torque, I-stage or II-stage healing pattern, implant diameter, implant length and T1-T2 time interval. Results The need for bone grafting as a predictor significantly influenced ISQ values in all three groups at T1 (weight coefficients ranging from -4 to -5). In contrast, implant diameter consistently influenced the ISQ values in all three groups at T2 (weight coefficients ranging from 3.4 to 4.2). Other factors, such as sex, age, I/II-stage implantation and bone type, did not significantly influence ISQ values at T2, and implant length did not significantly influence ISQ values at T1 or T2. Conclusions These findings provide a rational basis for mathematical models to quantitatively predict the ISQ values of implants in clinical practice. PMID:29084260
Linear regression techniques for use in the EC tracer method of secondary organic aerosol estimation
NASA Astrophysics Data System (ADS)
Saylor, Rick D.; Edgerton, Eric S.; Hartsell, Benjamin E.
A variety of linear regression techniques and simple slope estimators are evaluated for use in the elemental carbon (EC) tracer method of secondary organic carbon (OC) estimation. Linear regression techniques based on ordinary least squares are not suitable for situations where measurement uncertainties exist in both regressed variables. In the past, regression based on the method of Deming [1943. Statistical Adjustment of Data. Wiley, London] has been the preferred choice for EC tracer method parameter estimation. In agreement with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], we find that in the limited case where primary non-combustion OC (OC non-comb) is assumed to be zero, the ratio of averages (ROA) approach provides a stable and reliable estimate of the primary OC-EC ratio, (OC/EC) pri. In contrast with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], however, we find that the optimal use of Deming regression (and the more general York et al. [2004. Unified equations for the slope, intercept, and standard errors of the best straight line. American Journal of Physics 72, 367-375] regression) provides excellent results as well. For the more typical case where OC non-comb is allowed to obtain a non-zero value, we find that regression based on the method of York is the preferred choice for EC tracer method parameter estimation. In the York regression technique, detailed information on uncertainties in the measurement of OC and EC is used to improve the linear best fit to the given data. If only limited information is available on the relative uncertainties of OC and EC, then Deming regression should be used. On the other hand, use of ROA in the estimation of secondary OC, and thus the assumption of a zero OC non-comb value, generally leads to an overestimation of the contribution of secondary OC to total measured OC.
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.
NASA Astrophysics Data System (ADS)
Gonçalves, Karen dos Santos; Winkler, Mirko S.; Benchimol-Barbosa, Paulo Roberto; de Hoogh, Kees; Artaxo, Paulo Eduardo; de Souza Hacon, Sandra; Schindler, Christian; Künzli, Nino
2018-07-01
Epidemiological studies generally use particulate matter measurements with diameter less 2.5 μm (PM2.5) from monitoring networks. Satellite aerosol optical depth (AOD) data has considerable potential in predicting PM2.5 concentrations, and thus provides an alternative method for producing knowledge regarding the level of pollution and its health impact in areas where no ground PM2.5 measurements are available. This is the case in the Brazilian Amazon rainforest region where forest fires are frequent sources of high pollution. In this study, we applied a non-linear model for predicting PM2.5 concentration from AOD retrievals using interaction terms between average temperature, relative humidity, sine, cosine of date in a period of 365,25 days and the square of the lagged relative residual. Regression performance statistics were tested comparing the goodness of fit and R2 based on results from linear regression and non-linear regression for six different models. The regression results for non-linear prediction showed the best performance, explaining on average 82% of the daily PM2.5 concentrations when considering the whole period studied. In the context of Amazonia, it was the first study predicting PM2.5 concentrations using the latest high-resolution AOD products also in combination with the testing of a non-linear model performance. Our results permitted a reliable prediction considering the AOD-PM2.5 relationship and set the basis for further investigations on air pollution impacts in the complex context of Brazilian Amazon Region.
NASA Technical Reports Server (NTRS)
Dawson, Terence P.; Curran, Paul J.; Kupiec, John A.
1995-01-01
A major goal of airborne imaging spectrometry is to estimate the biochemical composition of vegetation canopies from reflectance spectra. Remotely-sensed estimates of foliar biochemical concentrations of forests would provide valuable indicators of ecosystem function at regional and eventually global scales. Empirical research has shown a relationship exists between the amount of radiation reflected from absorption features and the concentration of given biochemicals in leaves and canopies (Matson et al., 1994, Johnson et al., 1994). A technique commonly used to determine which wavelengths have the strongest correlation with the biochemical of interest is unguided (stepwise) multiple regression. Wavelengths are entered into a multivariate regression equation, in their order of importance, each contributing to the reduction of the variance in the measured biochemical concentration. A significant problem with the use of stepwise regression for determining the correlation between biochemical concentration and spectra is that of 'overfitting' as there are significantly more wavebands than biochemical measurements. This could result in the selection of wavebands which may be more accurately attributable to noise or canopy effects. In addition, there is a real problem of collinearity in that the individual biochemical concentrations may covary. A strong correlation between the reflectance at a given wavelength and the concentration of a biochemical of interest, therefore, may be due to the effect of another biochemical which is closely related. Furthermore, it is not always possible to account for potentially suitable waveband omissions in the stepwise selection procedure. This concern about the suitability of stepwise regression has been identified and acknowledged in a number of recent studies (Wessman et al., 1988, Curran, 1989, Curran et al., 1992, Peterson and Hubbard, 1992, Martine and Aber, 1994, Kupiec, 1994). These studies have pointed to the lack of a physical link between wavelengths chosen by stepwise regression and the biochemical of interest, and this in turn has cast doubts on the use of imaging spectrometry for the estimation of foliar biochemical concentrations at sites distant from the training sites. To investigate this problem, an analysis was conducted on the variation in canopy biochemical concentrations and reflectance spectra using forced entry linear regression.
The Correlation Between Metacognition Level with Self-Efficacy of Biology Education College Students
NASA Astrophysics Data System (ADS)
Ridlo, S.; Lutfiya, F.
2017-04-01
Self-efficacy is a strong predictor of academic achievement. Self-efficacy refers to the ability of college students to achieve the desired results. The metacognition level can influence college student’s self-efficacy. This study aims to identify college student’s metacognition level and self-efficacy, as well as determine the relationship between self-efficacy and metacognition level for college students of Biology Education 2013, Semarang State University. The ex-post facto quantitative research was conducted on 99 students Academic Year 2015/2016. Saturation sampling technique determined samples. E-D scale collected data for self-efficacy identification. Data for assess the metacognition level collected by Metacognitive Awareness Inventory. Data were analysed quantitatively by Pearson correlation and linear regression. Most college students have the high level of metacognition and average self-efficacy. Pearson correlation coefficient result was 0.367. This result showed that metacognition level and self-efficacy has a weak relationship. Based on linear regression test, self-efficacy influenced by metacognition level up to 13.5%. The results of the study showed that positive and significant relationships exist between metacognition level and self-efficacy. Therefore, if the metacognition level is high, then self-efficacy will also be high (appropriate).
Yin, Xiaoxv; Yan, Shijiao; Tong, Yeqing; Peng, Xin; Yang, Tingting; Lu, Zuxun; Gong, Yanhong
2018-02-01
Tuberculosis (TB) poses a significant challenge to public health worldwide. Stigma is a major obstacle to TB control by leading to delay in diagnosis and treatment non-adherence. This study aimed to evaluate the status of TB-related stigma and its associated factors among TB patients in China. Cross-sectional survey. Thus, 1342 TB patients were recruited from TB dispensaries in three counties in Hubei Province using a multistage sampling method and surveyed using a structured anonymous questionnaire including validated scales to measure TB-related stigma. A generalised linear regression model was used to identify the factors associated with TB-related stigma. The average score on the TB-related Stigma Scale was 9.33 (SD = 4.25). Generalised linear regression analysis revealed that knowledge about TB (ß = -0.18, P = 0.0025), family function (ß = -0.29, P < 0.0001) and doctor-patient communication (ß = -0.32, P = 0.0005) were negatively associated with TB-related stigma. TB-related stigma was high among TB patients in China. Interventions concentrating on reducing TB patients' stigma in China should focus on improving patients' family function and patients' knowledge about TB. © 2017 John Wiley & Sons Ltd.
Gouvêa, Ana Cristina M S; Melo, Armindo; Santiago, Manuela C P A; Peixoto, Fernanda M; Freitas, Vitor; Godoy, Ronoel L O; Ferreira, Isabel M P L V O
2015-10-15
Neomitranthes obscura (DC.) N. Silveira is a Brazilian fruit belonging to the Myrtaceae family that contains anthocyanins in the peel and was studied for the first time in this work. Delphinidin-3-O-galactoside, delphinidin-3-O-glucoside, cyanidin-3-O-galactoside, cyanidin-3-O-glucoside, cyanidin-3-O-arabinoside, petunidin-3-O-glucoside, pelargonidin-3-O-glucoside, peonidin-3-O-galactoside, peonidin-3-O-glucoside, cyanidin-3-O-xyloside were separated and identified by LC/DAD/MS and by co-elution with standards. Reliable quantification of anthocyanins in the mature fruits was performed by HPLC/DAD using weighted linear regression model from 0.05 to 50mg of cyaniding-3-O-glucoside L(-1) because it gave better fit quality than least squares linear regression. Good precision and accuracy were obtained. The total anthocyanin content of mature fruits was 263.6 ± 8.2 mg of cyanidin-3-O-glucoside equivalents 100 g(-1) fresh weight, which was in the same range found in literature for anthocyanin rich fruits. Copyright © 2015. Published by Elsevier Ltd.
Serum Iron Level Is Associated with Time to Antibiotics in Cystic Fibrosis.
Gifford, Alex H; Dorman, Dana B; Moulton, Lisa A; Helm, Jennifer E; Griffin, Mary M; MacKenzie, Todd A
2015-12-01
Serum levels of hepcidin-25, a peptide hormone that reduces blood iron content, are elevated when patients with cystic fibrosis (CF) develop pulmonary exacerbation (PEx). Because hepcidin-25 is unavailable as a clinical laboratory test, we questioned whether a one-time serum iron level was associated with the subsequent number of days until PEx, as defined by the need to receive systemic antibiotics (ABX) for health deterioration. Clinical, biochemical, and microbiological parameters were simultaneously checked in 54 adults with CF. Charts were reviewed to determine when they first experienced a PEx after these parameters were assessed. Time to ABX was compared in subgroups with and without specific attributes. Multivariate linear regression was used to identify parameters that significantly explained variation in time to ABX. In univariate analyses, time to ABX was significantly shorter in subjects with Aspergillus-positive sputum cultures and CF-related diabetes. Multivariate linear regression models demonstrated that shorter time to ABX was associated with younger age, lower serum iron level, and Aspergillus sputum culture positivity. Serum iron, age, and Aspergillus sputum culture positivity are factors associated with shorter time to subsequent PEx in CF adults. © 2015 Wiley Periodicals, Inc.
Factors associated with arterial stiffness in children aged 9-10 years
Batista, Milena Santos; Mill, José Geraldo; Pereira, Taisa Sabrina Silva; Fernandes, Carolina Dadalto Rocha; Molina, Maria del Carmen Bisi
2015-01-01
OBJECTIVE To analyze the factors associated with stiffness of the great arteries in prepubertal children. METHODS This study with convenience sample of 231 schoolchildren aged 9-10 years enrolled in public and private schools in Vitória, ES, Southeastern Brazil, in 2010-2011. Anthropometric and hemodynamic data, blood pressure, and pulse wave velocity in the carotid-femoral segment were obtained. Data on current and previous health conditions were obtained by questionnaire and notes on the child’s health card. Multiple linear regression was applied to identify the partial and total contribution of the factors in determining the pulse wave velocity values. RESULTS Among the students, 50.2% were female and 55.4% were 10 years old. Among those classified in the last tertile of pulse wave velocity, 60.0% were overweight, with higher mean blood pressure, waist circumference, and waist-to-height ratio. Birth weight was not associated with pulse wave velocity. After multiple linear regression analysis, body mass index (BMI) and diastolic blood pressure remained in the model. CONCLUSIONS BMI was the most important factor in determining arterial stiffness in children aged 9-10 years. PMID:25902563
Racial Disparity in Renal Transplantation: Alemtuzumab the Great Equalizer?
Smith, Alison A; John, Mira M; Dortonne, Isabelle S; Paramesh, Anil S; Killackey, Mary; Jaffe, Bernard M; Buell, Joseph F
2015-10-01
Racial disparity as a barrier to successful outcomes in renal transplants for African Americans has been well described. Numerous unsuccessful attempts have been made to identify specific immunologic and socioeconomic factors. The objective of our study was to determine whether alemtuzumab (AL) induction abolishes this discrepancy and improves allograft survival in African American recipients. A retrospective chart review of consecutive adult renal transplants was conducted between 2006 and 2014. Kaplan-Meier analysis and hazard ratios were calculated for the African Americans (AA) and white groups. Multiple linear regressions were performed to assess independent variables (race, retransplant, sex, donor type, induction agent) on allograft survival. A significant difference in allograft survival was identified between whites (n = 272) and AA (n = 445), with AA experiencing more graft losses (18.2% vs 12.1%, P = 0.0351). Induction with AL improved outcomes in all transplant recipients. Multiple linear regression identified that the strongest predictor of allograft failure was induction without AL (P < 0.0001). The data for a subset analysis matched for follow-up length demonstrated that whites compared with AA (n = 157, 67 whites and 90 AA) had lower rates of allograft failure in the absence of AL induction (14.9% vs 44.4%, P = 0.0156, hazard ratio = 2.077). In contrast, AL induction (n = 275, 105 whites and 170 AA) eliminated the racial disparity in allograft failure (5.7% vs 9.4%, P = 0.8248, hazard ratio = 1.504). This is the first study to describe the effects of AL induction therapy on AA renal transplant recipients beyond the first posttransplant year. Our early results suggest that AL induction therapy abolishes the disparity in renal allograft failure.
Gondo, Tatsuo; Ohno, Yoshio; Nakashima, Jun; Hashimoto, Takeshi; Nakagami, Yoshihiro; Tachibana, Masaaki
2017-02-01
To identify preoperative factors correlated with postoperative early renal function in patients who had undergone radical cystectomy (RC) and intestinal urinary diversion. We retrospectively identified 201 consecutive bladder cancer patients without distant metastasis who had undergone RC at our institution between 2003 and 2012. The estimated glomerular filtration rate (eGFR) was calculated using the modified Chronic Kidney Disease Epidemiology equation before RC and 3 months following RC. Univariate and stepwise multiple linear regression analyses were applied to estimate postoperative renal function and to identify significant preoperative predictors of postoperative renal function. Patients who had undergone intestinal urinary diversion and were available for the collection of follow-up data (n = 164) were eligible for the present study. Median preoperative and postoperative eGFRs were 69.7 (interquartile range [IQR] 56.3-78.0) and 70.7 (IQR 57.3-78.1), respectively. In univariate analyses, age, preoperative proteinuria, thickness of abdominal subcutaneous fat tissue (TSF), preoperative serum creatinine level, preoperative eGFR, and urinary diversion type were significantly associated with postoperative eGFR. In a stepwise multiple linear regression analysis, preoperative eGFR, age, and TSF were significant factors for predicting postoperative eGFR (p < 0.001, p = 0.02, and p = 0.046, respectively). The estimated postoperative eGFRs correlated well with the actual postoperative eGFRs (r = 0.65, p < 0.001). Preoperative eGFR, age, and TSF were independent preoperative factors for determining postoperative renal function in patients who had undergone RC and intestinal urinary diversion. These results may be used for patient counseling before surgery, including the planning of perioperative chemotherapy administration.
NASA Technical Reports Server (NTRS)
Burke, Michael W.; Judge, Russell A.; Pusey, Marc L.; Rose, M. Franklin (Technical Monitor)
2000-01-01
Full factorial experiment design incorporating multi-linear regression analysis of the experimental data allows the main trends and effects to be quickly identified while using only a limited number of experiments. These techniques were used to identify the effect of precipitant concentration and the presence of an impurity, the physiological lysozyme dimer, on the nucleation rate and crystal dimensions of the tetragonal form of chicken egg white lysozyme. Increasing precipitant concentration was found to decrease crystal numbers, the magnitude of this effect also depending on the supersaturation. The presence of the dimer generally increased nucleation. The crystal axial ratio decreased with increasing precipitant concentration independent of impurity.
Avrutin, Egor; Moisey, Lesley L; Zhang, Roselyn; Khattab, Jenna; Todd, Emma; Premji, Tahira; Kozar, Rosemary; Heyland, Daren K; Mourtzakis, Marina
2017-12-06
Computed tomography (CT) scans performed during routine hospital care offer the opportunity to quantify skeletal muscle and predict mortality and morbidity in intensive care unit (ICU) patients. Existing methods of muscle cross-sectional area (CSA) quantification require specialized software, training, and time commitment that may not be feasible in a clinical setting. In this article, we explore a new screening method to identify patients with low muscle mass. We analyzed 145 scans of elderly ICU patients (≥65 years old) using a combination of measures obtained with a digital ruler, commonly found on hospital radiological software. The psoas and paraspinal muscle groups at the level of the third lumbar vertebra (L3) were evaluated by using 2 linear measures each and compared with an established method of CT image analysis of total muscle CSA in the L3 region. There was a strong association between linear measures of psoas and paraspinal muscle groups and total L3 muscle CSA (R 2 = 0.745, P < 0.001). Linear measures, age, and sex were included as covariates in a multiple logistic regression to predict those with low muscle mass; receiver operating characteristic (ROC) area under the curve (AUC) of the combined psoas and paraspinal linear index model was 0.920. Intraclass correlation coefficients (ICCs) were used to evaluate intrarater and interrater reliability, resulting in scores of 0.979 (95% CI: 0.940-0.992) and 0.937 (95% CI: 0.828-0.978), respectively. A digital ruler can reliably predict L3 muscle CSA, and these linear measures may be used to identify critically ill patients with low muscularity who are at risk for worse clinical outcomes. © 2017 American Society for Parenteral and Enteral Nutrition.
A kinetic energy model of two-vehicle crash injury severity.
Sobhani, Amir; Young, William; Logan, David; Bahrololoom, Sareh
2011-05-01
An important part of any model of vehicle crashes is the development of a procedure to estimate crash injury severity. After reviewing existing models of crash severity, this paper outlines the development of a modelling approach aimed at measuring the injury severity of people in two-vehicle road crashes. This model can be incorporated into a discrete event traffic simulation model, using simulation model outputs as its input. The model can then serve as an integral part of a simulation model estimating the crash potential of components of the traffic system. The model is developed using Newtonian Mechanics and Generalised Linear Regression. The factors contributing to the speed change (ΔV(s)) of a subject vehicle are identified using the law of conservation of momentum. A Log-Gamma regression model is fitted to measure speed change (ΔV(s)) of the subject vehicle based on the identified crash characteristics. The kinetic energy applied to the subject vehicle is calculated by the model, which in turn uses a Log-Gamma Regression Model to estimate the Injury Severity Score of the crash from the calculated kinetic energy, crash impact type, presence of airbag and/or seat belt and occupant age. Copyright © 2010 Elsevier Ltd. All rights reserved.
Prunier, J G; Colyn, M; Legendre, X; Nimon, K F; Flamand, M C
2015-01-01
Direct gradient analyses in spatial genetics provide unique opportunities to describe the inherent complexity of genetic variation in wildlife species and are the object of many methodological developments. However, multicollinearity among explanatory variables is a systemic issue in multivariate regression analyses and is likely to cause serious difficulties in properly interpreting results of direct gradient analyses, with the risk of erroneous conclusions, misdirected research and inefficient or counterproductive conservation measures. Using simulated data sets along with linear and logistic regressions on distance matrices, we illustrate how commonality analysis (CA), a detailed variance-partitioning procedure that was recently introduced in the field of ecology, can be used to deal with nonindependence among spatial predictors. By decomposing model fit indices into unique and common (or shared) variance components, CA allows identifying the location and magnitude of multicollinearity, revealing spurious correlations and thus thoroughly improving the interpretation of multivariate regressions. Despite a few inherent limitations, especially in the case of resistance model optimization, this review highlights the great potential of CA to account for complex multicollinearity patterns in spatial genetics and identifies future applications and lines of research. We strongly urge spatial geneticists to systematically investigate commonalities when performing direct gradient analyses. © 2014 John Wiley & Sons Ltd.
Non-Linear Approach in Kinesiology Should Be Preferred to the Linear--A Case of Basketball.
Trninić, Marko; Jeličić, Mario; Papić, Vladan
2015-07-01
In kinesiology, medicine, biology and psychology, in which research focus is on dynamical self-organized systems, complex connections exist between variables. Non-linear nature of complex systems has been discussed and explained by the example of non-linear anthropometric predictors of performance in basketball. Previous studies interpreted relations between anthropometric features and measures of effectiveness in basketball by (a) using linear correlation models, and by (b) including all basketball athletes in the same sample of participants regardless of their playing position. In this paper the significance and character of linear and non-linear relations between simple anthropometric predictors (AP) and performance criteria consisting of situation-related measures of effectiveness (SE) in basketball were determined and evaluated. The sample of participants consisted of top-level junior basketball players divided in three groups according to their playing time (8 minutes and more per game) and playing position: guards (N = 42), forwards (N = 26) and centers (N = 40). Linear (general model) and non-linear (general model) regression models were calculated simultaneously and separately for each group. The conclusion is viable: non-linear regressions are frequently superior to linear correlations when interpreting actual association logic among research variables.
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.
2013-01-01
Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A
2013-01-01
Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi
2007-10-01
Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods.
O'Leary, Neil; Chauhan, Balwantray C; Artes, Paul H
2012-10-01
To establish a method for estimating the overall statistical significance of visual field deterioration from an individual patient's data, and to compare its performance to pointwise linear regression. The Truncated Product Method was used to calculate a statistic S that combines evidence of deterioration from individual test locations in the visual field. The overall statistical significance (P value) of visual field deterioration was inferred by comparing S with its permutation distribution, derived from repeated reordering of the visual field series. Permutation of pointwise linear regression (PoPLR) and pointwise linear regression were evaluated in data from patients with glaucoma (944 eyes, median mean deviation -2.9 dB, interquartile range: -6.3, -1.2 dB) followed for more than 4 years (median 10 examinations over 8 years). False-positive rates were estimated from randomly reordered series of this dataset, and hit rates (proportion of eyes with significant deterioration) were estimated from the original series. The false-positive rates of PoPLR were indistinguishable from the corresponding nominal significance levels and were independent of baseline visual field damage and length of follow-up. At P < 0.05, the hit rates of PoPLR were 12, 29, and 42%, at the fifth, eighth, and final examinations, respectively, and at matching specificities they were consistently higher than those of pointwise linear regression. In contrast to population-based progression analyses, PoPLR provides a continuous estimate of statistical significance for visual field deterioration individualized to a particular patient's data. This allows close control over specificity, essential for monitoring patients in clinical practice and in clinical trials.
ERIC Educational Resources Information Center
Liou, Pey-Yan
2009-01-01
The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…
Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.
Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W
1992-03-01
The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
NASA Astrophysics Data System (ADS)
Gusriani, N.; Firdaniza
2018-03-01
The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
Competing regression models for longitudinal data.
Alencar, Airlane P; Singer, Julio M; Rocha, Francisco Marcelo M
2012-03-01
The choice of an appropriate family of linear models for the analysis of longitudinal data is often a matter of concern for practitioners. To attenuate such difficulties, we discuss some issues that emerge when analyzing this type of data via a practical example involving pretest-posttest longitudinal data. In particular, we consider log-normal linear mixed models (LNLMM), generalized linear mixed models (GLMM), and models based on generalized estimating equations (GEE). We show how some special features of the data, like a nonconstant coefficient of variation, may be handled in the three approaches and evaluate their performance with respect to the magnitude of standard errors of interpretable and comparable parameters. We also show how different diagnostic tools may be employed to identify outliers and comment on available software. We conclude by noting that the results are similar, but that GEE-based models may be preferable when the goal is to compare the marginal expected responses. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Depressive disorder in pregnant Latin women: does intimate partner violence matter?
Fonseca-Machado, Mariana de Oliveira; Alves, Lisiane Camargo; Monteiro, Juliana Cristina Dos Santos; Stefanello, Juliana; Nakano, Ana Márcia Spanó; Haas, Vanderlei José; Gomes-Sponholz, Flávia
2015-05-01
To identify the association of antenatal depressive symptoms with intimate partner violence during the current pregnancy in Brazilian women. Intimate partner violence is an important risk factor for antenatal depression. To the authors' knowledge, there has been no study to date that assessed the association between intimate partner violence during pregnancy and antenatal depressive symptoms among Brazilian women. Cross-sectional study. Three hundred and fifty-eight pregnant women were enrolled in the study. The Edinburgh Postnatal Depression Scale and an adapted version of the instrument used in the World Health Organization Multi-country Study on Women's Health and Domestic Violence were used to measure antenatal depressive symptoms and psychological, physical and sexual acts of intimate partner violence during the current pregnancy respectively. Multiple logistic regression and multiple linear regression were used for data analysis. The prevalence of antenatal depressive symptoms, as determined by the cut-off score of 12 in the Edinburgh Postnatal Depression Scale, was 28·2% (101). Of the participants, 63 (17·6%) reported some type of intimate partner violence during pregnancy. Among them, 60 (95·2%) reported suffering psychological violence, 23 (36·5%) physical violence and one (1·6%) sexual violence. Multiple logistic regression and multiple linear regression indicated that antenatal depressive symptoms are extremely associated with intimate partner violence during pregnancy. Among Brazilian women, exposure to intimate partner violence during pregnancy increases the chances of experiencing antenatal depressive symptoms. Clinical nurses and nurses midwifes should pay attention to the particularities of Brazilian women, especially with regard to the occurrence of intimate partner violence, whose impacts on the mental health of this population are extremely significant, both during the gestational period and postpartum. © 2015 John Wiley & Sons Ltd.
Verhelst, Stefanie; Poppe, Willy A J; Bogers, Johannes J; Depuydt, Christophe E
2017-03-01
This retrospective study examined whether human papillomavirus (HPV) type-specific viral load changes measured in two or three serial cervical smears are predictive for the natural evolution of HPV infections and correlate with histological grades of cervical intraepithelial neoplasia (CIN), allowing triage of HPV-positive women. A cervical histology database was used to select consecutive women with biopsy-proven CIN in 2012 who had at least two liquid-based cytology samples before the diagnosis of CIN. Before performing cytology, 18 different quantitative PCRs allowed HPV type-specific viral load measurement. Changes in HPV-specific load between measurements were assessed by linear regression, with calculation of coefficient of determination (R) and slope. All infections could be classified into one of five categories: (i) clonal progressing process (R≥0.85; positive slope), (ii) simultaneously occurring clonal progressive and transient infection, (iii) clonal regressing process (R≥0.85; negative slope), (iv) serial transient infection with latency [R<0.85; slopes (two points) between 0.0010 and -0.0010 HPV copies/cell/day], and (v) transient productive infection (R<0.85; slope: ±0.0099 HPV copies/cell/day). Three hundred and seven women with CIN were included; 124 had single-type infections and 183 had multiple HPV types. Only with three consecutive measurements could a clonal process be identified in all CIN3 cases. We could clearly demonstrate clonal regressing lesions with a persistent linear decrease in viral load (R≥0.85; -0.003 HPV copies/cell/day) in all CIN categories. Type-specific viral load increase/decrease in three consecutive measurements enabled classification of CIN lesions in clonal HPV-driven transformation (progression/regression) and nonclonal virion-productive (serial transient/transient) processes.
Regression to fuzziness method for estimation of remaining useful life in power plant components
NASA Astrophysics Data System (ADS)
Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.
2014-10-01
Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.
Huang, Li-Shan; Myers, Gary J.; Davidson, Philip W.; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W.; Cernichiari, Elsa; Shamlaye, Conrad F.; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W.
2007-01-01
Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age nine years. The analyses for the most recent nine-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated nonlinearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of twenty-one endpoints available at age nine years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other nine-year endpoints that in the linear analysis has a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels. PMID:17942158
Orthogonal Projection in Teaching Regression and Financial Mathematics
ERIC Educational Resources Information Center
Kachapova, Farida; Kachapov, Ilias
2010-01-01
Two improvements in teaching linear regression are suggested. The first is to include the population regression model at the beginning of the topic. The second is to use a geometric approach: to interpret the regression estimate as an orthogonal projection and the estimation error as the distance (which is minimized by the projection). Linear…
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Multi-variant study of obesity risk genes in African Americans: The Jackson Heart Study.
Liu, Shijian; Wilson, James G; Jiang, Fan; Griswold, Michael; Correa, Adolfo; Mei, Hao
2016-11-30
Genome-wide association study (GWAS) has been successful in identifying obesity risk genes by single-variant association analysis. For this study, we designed steps of analysis strategy and aimed to identify multi-variant effects on obesity risk among candidate genes. Our analyses were focused on 2137 African American participants with body mass index measured in the Jackson Heart Study and 657 common single nucleotide polymorphisms (SNPs) genotyped at 8 GWAS-identified obesity risk genes. Single-variant association test showed that no SNPs reached significance after multiple testing adjustment. The following gene-gene interaction analysis, which was focused on SNPs with unadjusted p-value<0.10, identified 6 significant multi-variant associations. Logistic regression showed that SNPs in these associations did not have significant linear interactions; examination of genetic risk score evidenced that 4 multi-variant associations had significant additive effects of risk SNPs; and haplotype association test presented that all multi-variant associations contained one or several combinations of particular alleles or haplotypes, associated with increased obesity risk. Our study evidenced that obesity risk genes generated multi-variant effects, which can be additive or non-linear interactions, and multi-variant study is an important supplement to existing GWAS for understanding genetic effects of obesity risk genes. Copyright © 2016 Elsevier B.V. All rights reserved.
Paternal mental health and socioemotional and behavioral development in their children.
Kvalevaag, Anne Lise; Ramchandani, Paul G; Hove, Oddbjørn; Assmus, Jörg; Eberhard-Gran, Malin; Biringer, Eva
2013-02-01
To examine the association between symptoms of psychological distress in expectant fathers and socioemotional and behavioral outcomes in their children at age 36 months. The current study is based on data from the Norwegian Mother and Child Cohort Study on 31 663 children. Information about fathers' mental health was obtained by self-report (Hopkins Symptom Checklist) in week 17 or 18 of gestation. Information about mothers' pre- and postnatal mental health and children's socioemotional and behavioral development at 36 months of age was obtained from parent-report questionnaires. Linear multiple regression and logistic regression models were performed while controlling for demographics, lifestyle variables, and mothers' mental health. Three percent of the fathers had high levels of psychological distress. Using linear regression models, we found a small positive association between fathers' psychological distress and children's behavioral difficulties, B = 0.19 (95% confidence interval [CI] = 0.15-0.23); emotional difficulties, B = 0.22 (95% CI = 0.18-0.26); and social functioning, B = 0.12 (95% CI = 0.07-0.16). The associations did not change when adjusted for relevant confounders. Children whose fathers had high levels of psychological distress had higher levels of emotional and behavioral problems. This study suggests that some risk of future child emotional, behavioral, and social problems can be identified during pregnancy. The findings are of importance for clinicians and policy makers in their planning of health care in the perinatal period because this represents a significant opportunity for preventive intervention.
Does higher education protect against obesity? Evidence using Mendelian randomization.
Böckerman, Petri; Viinikainen, Jutta; Pulkki-Råback, Laura; Hakulinen, Christian; Pitkänen, Niina; Lehtimäki, Terho; Pehkonen, Jaakko; Raitakari, Olli T
2017-08-01
The aim of this explorative study was to examine the effect of education on obesity using Mendelian randomization. Participants (N=2011) were from the on-going nationally representative Young Finns Study (YFS) that began in 1980 when six cohorts (aged 30, 33, 36, 39, 42 and 45 in 2007) were recruited. The average value of BMI (kg/m 2 ) measurements in 2007 and 2011 and genetic information were linked to comprehensive register-based information on the years of education in 2007. We first used a linear regression (Ordinary Least Squares, OLS) to estimate the relationship between education and BMI. To identify a causal relationship, we exploited Mendelian randomization and used a genetic score as an instrument for education. The genetic score was based on 74 genetic variants that genome-wide association studies (GWASs) have found to be associated with the years of education. Because the genotypes are randomly assigned at conception, the instrument causes exogenous variation in the years of education and thus enables identification of causal effects. The years of education in 2007 were associated with lower BMI in 2007/2011 (regression coefficient (b)=-0.22; 95% Confidence Intervals [CI]=-0.29, -0.14) according to the linear regression results. The results based on Mendelian randomization suggests that there may be a negative causal effect of education on BMI (b=-0.84; 95% CI=-1.77, 0.09). The findings indicate that education could be a protective factor against obesity in advanced countries. Copyright © 2017 Elsevier Inc. All rights reserved.
Guertin, Kristin A; Loftfield, Erikka; Boca, Simina M; Sampson, Joshua N; Moore, Steven C; Xiao, Qian; Huang, Wen-Yi; Xiong, Xiaoqin; Freedman, Neal D; Cross, Amanda J; Sinha, Rashmi
2015-05-01
Coffee intake may be inversely associated with colorectal cancer; however, previous studies have been inconsistent. Serum coffee metabolites are integrated exposure measures that may clarify associations with cancer and elucidate underlying mechanisms. Our aims were 2-fold as follows: 1) to identify serum metabolites associated with coffee intake and 2) to examine these metabolites in relation to colorectal cancer. In a nested case-control study of 251 colorectal cancer cases and 247 matched control subjects from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial, we conducted untargeted metabolomics analyses of baseline serum by using ultrahigh-performance liquid-phase chromatography-tandem mass spectrometry and gas chromatography-mass spectrometry. Usual coffee intake was self-reported in a food-frequency questionnaire. We used partial Pearson correlations and linear regression to identify serum metabolites associated with coffee intake and conditional logistic regression to evaluate associations between coffee metabolites and colorectal cancer. After Bonferroni correction for multiple comparisons (P = 0.05 ÷ 657 metabolites), 29 serum metabolites were positively correlated with coffee intake (partial correlation coefficients: 0.18-0.61; P < 7.61 × 10(-5)); serum metabolites most highly correlated with coffee intake (partial correlation coefficients >0.40) included trigonelline (N'-methylnicotinate), quinate, and 7 unknown metabolites. Of 29 serum metabolites, 8 metabolites were directly related to caffeine metabolism, and 3 of these metabolites, theophylline (OR for 90th compared with 10th percentiles: 0.44; 95% CI: 0.25, 0.79; P-linear trend = 0.006), caffeine (OR for 90th compared with 10th percentiles: 0.56; 95% CI: 0.35, 0.89; P-linear trend = 0.015), and paraxanthine (OR for 90th compared with 10th percentiles: 0.58; 95% CI: 0.36, 0.94; P-linear trend = 0.027), were inversely associated with colorectal cancer. Serum metabolites can distinguish coffee drinkers from nondrinkers; some caffeine-related metabolites were inversely associated with colorectal cancer and should be studied further to clarify the role of coffee in the cause of colorectal cancer. The Prostate, Lung, Colorectal, and Ovarian trial was registered at clinicaltrials.gov as NCT00002540. © 2015 American Society for Nutrition.
Analysis of Learning Curve Fitting Techniques.
1987-09-01
1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied
On vertical profile of ozone at Syowa
NASA Technical Reports Server (NTRS)
Chubachi, Shigeru
1994-01-01
The difference in the vertical ozone profile at Syowa between 1966-1981 and 1982-1988 is shown. The month-height cross section of the slope of the linear regressions between ozone partial pressure and 100-mb temperature is also shown. The vertically integrated values of the slopes are in close agreement with the slopes calculated by linear regression of Dobson total ozone on 100-mb temperature in the period of 1982-1988.
Kovačević, Strahinja; Karadžić, Milica; Podunavac-Kuzmanović, Sanja; Jevrić, Lidija
2018-01-01
The present study is based on the quantitative structure-activity relationship (QSAR) analysis of binding affinity toward human prion protein (huPrP C ) of quinacrine, pyridine dicarbonitrile, diphenylthiazole and diphenyloxazole analogs applying different linear and non-linear chemometric regression techniques, including univariate linear regression, multiple linear regression, partial least squares regression and artificial neural networks. The QSAR analysis distinguished molecular lipophilicity as an important factor that contributes to the binding affinity. Principal component analysis was used in order to reveal similarities or dissimilarities among the studied compounds. The analysis of in silico absorption, distribution, metabolism, excretion and toxicity (ADMET) parameters was conducted. The ranking of the studied analogs on the basis of their ADMET parameters was done applying the sum of ranking differences, as a relatively new chemometric method. The main aim of the study was to reveal the most important molecular features whose changes lead to the changes in the binding affinities of the studied compounds. Another point of view on the binding affinity of the most promising analogs was established by application of molecular docking analysis. The results of the molecular docking were proven to be in agreement with the experimental outcome. Copyright © 2017 Elsevier B.V. All rights reserved.
Classification of sodium MRI data of cartilage using machine learning.
Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R
2015-11-01
To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.
Claessens, T E; Georgakopoulos, D; Afanasyeva, M; Vermeersch, S J; Millar, H D; Stergiopulos, N; Westerhof, N; Verdonck, P R; Segers, P
2006-04-01
The linear time-varying elastance theory is frequently used to describe the change in ventricular stiffness during the cardiac cycle. The concept assumes that all isochrones (i.e., curves that connect pressure-volume data occurring at the same time) are linear and have a common volume intercept. Of specific interest is the steepest isochrone, the end-systolic pressure-volume relationship (ESPVR), of which the slope serves as an index for cardiac contractile function. Pressure-volume measurements, achieved with a combined pressure-conductance catheter in the left ventricle of 13 open-chest anesthetized mice, showed a marked curvilinearity of the isochrones. We therefore analyzed the shape of the isochrones by using six regression algorithms (two linear, two quadratic, and two logarithmic, each with a fixed or time-varying intercept) and discussed the consequences for the elastance concept. Our main observations were 1) the volume intercept varies considerably with time; 2) isochrones are equally well described by using quadratic or logarithmic regression; 3) linear regression with a fixed intercept shows poor correlation (R(2) < 0.75) during isovolumic relaxation and early filling; and 4) logarithmic regression is superior in estimating the fixed volume intercept of the ESPVR. In conclusion, the linear time-varying elastance fails to provide a sufficiently robust model to account for changes in pressure and volume during the cardiac cycle in the mouse ventricle. A new framework accounting for the nonlinear shape of the isochrones needs to be developed.
Lopes, Marta B; Calado, Cecília R C; Figueiredo, Mário A T; Bioucas-Dias, José M
2017-06-01
The monitoring of biopharmaceutical products using Fourier transform infrared (FT-IR) spectroscopy relies on calibration techniques involving the acquisition of spectra of bioprocess samples along the process. The most commonly used method for that purpose is partial least squares (PLS) regression, under the assumption that a linear model is valid. Despite being successful in the presence of small nonlinearities, linear methods may fail in the presence of strong nonlinearities. This paper studies the potential usefulness of nonlinear regression methods for predicting, from in situ near-infrared (NIR) and mid-infrared (MIR) spectra acquired in high-throughput mode, biomass and plasmid concentrations in Escherichia coli DH5-α cultures producing the plasmid model pVAX-LacZ. The linear methods PLS and ridge regression (RR) are compared with their kernel (nonlinear) versions, kPLS and kRR, as well as with the (also nonlinear) relevance vector machine (RVM) and Gaussian process regression (GPR). For the systems studied, RR provided better predictive performances compared to the remaining methods. Moreover, the results point to further investigation based on larger data sets whenever differences in predictive accuracy between a linear method and its kernelized version could not be found. The use of nonlinear methods, however, shall be judged regarding the additional computational cost required to tune their additional parameters, especially when the less computationally demanding linear methods herein studied are able to successfully monitor the variables under study.
Wan, Chao; Hao, Zhixiu; Wen, Shizhu; Leng, Huijie
2014-01-01
The mechanical properties of ligaments are key contributors to the stability and function of musculoskeletal joints. Ligaments are generally composed of ground substance, collagen (mainly type I and III collagen), and minimal elastin fibers. However, no consensus has been reached about whether the distribution of different types of collagen correlates with the mechanical behaviors of ligaments. The main objective of this study was to determine whether the collagen type distribution is correlated with the mechanical properties of ligaments. Using axial tensile tests and picrosirius red staining-polarization observations, the mechanical behaviors and the ratios of the various types of collagen were investigated for twenty-four rabbit medial collateral ligaments from twenty-four rabbits of different ages, respectively. One-way analysis of variance was used in the comparison of the Young's modulus in the linear region of the stress-strain curves and the ratios of type I and III collagen for the specimens (the mid-substance specimens of the ligaments) with different ages. A multiple linear regression was performed using the collagen contents (the ratios of type I and III collagen) and the Young's modulus of the specimens. During the maturation of the ligaments, the type I collagen content increased, and the type III collagen content decreased. A significant and strong correlation () was identified by multiple linear regression between the collagen contents (i.e., the ratios of type I and type III collagen) and the mechanical properties of the specimens. The collagen content of ligaments might provide a new perspective for evaluating the linear modulus of global stress-strain curves for ligaments and open a new door for studying the mechanical behaviors and functions of connective tissues. PMID:25062068
Wan, Chao; Hao, Zhixiu; Wen, Shizhu; Leng, Huijie
2014-01-01
The mechanical properties of ligaments are key contributors to the stability and function of musculoskeletal joints. Ligaments are generally composed of ground substance, collagen (mainly type I and III collagen), and minimal elastin fibers. However, no consensus has been reached about whether the distribution of different types of collagen correlates with the mechanical behaviors of ligaments. The main objective of this study was to determine whether the collagen type distribution is correlated with the mechanical properties of ligaments. Using axial tensile tests and picrosirius red staining-polarization observations, the mechanical behaviors and the ratios of the various types of collagen were investigated for twenty-four rabbit medial collateral ligaments from twenty-four rabbits of different ages, respectively. One-way analysis of variance was used in the comparison of the Young's modulus in the linear region of the stress-strain curves and the ratios of type I and III collagen for the specimens (the mid-substance specimens of the ligaments) with different ages. A multiple linear regression was performed using the collagen contents (the ratios of type I and III collagen) and the Young's modulus of the specimens. During the maturation of the ligaments, the type I collagen content increased, and the type III collagen content decreased. A significant and strong correlation (R2 = 0.839, P < 0.05) was identified by multiple linear regression between the collagen contents (i.e., the ratios of type I and type III collagen) and the mechanical properties of the specimens. The collagen content of ligaments might provide a new perspective for evaluating the linear modulus of global stress-strain curves for ligaments and open a new door for studying the mechanical behaviors and functions of connective tissues.
The impact of intrinsic and extrinsic factors on the job satisfaction of dentists.
Goetz, K; Campbell, S M; Broge, B; Dörfer, C E; Brodowski, M; Szecsenyi, J
2012-10-01
The Two-Factor Theory of job satisfaction distinguishes between intrinsic-motivation (i.e. recognition, responsibility) and extrinsic-hygiene (i.e. job security, salary, working conditions) factors. The presence of intrinsic-motivation facilitates higher satisfaction and performance, whereas the absences of extrinsic factors help mitigate against dissatisfaction. The consideration of these factors and their impact on dentists' job satisfaction is essential for the recruitment and retention of dentists. The objective of the study is to assess the level of job satisfaction of German dentists and the factors that are associated with it. This cross-sectional study was based on a job satisfaction survey. Data were collected from 147 dentists working in 106 dental practices. Job satisfaction was measured with the 10-item Warr-Cook-Wall job satisfaction scale. Organizational characteristics were measured with two items. Linear regression analyses were performed in which each of the nine items of the job satisfaction scale (excluding overall satisfaction) were handled as dependent variables. A stepwise linear regression analysis was performed with overall job satisfaction as the dependent outcome variable, the nine items of job satisfaction and the two items of organizational characteristics controlled for age and gender as predictors. The response rate was 95.0%. Dentists were satisfied with 'freedom of working method' and mostly dissatisfied with their 'income'. Both variables are extrinsic factors. The regression analyses identified five items that were significantly associated with each item of the job satisfaction scale: 'age', 'mean weekly working time', 'period in the practice', 'number of dentist's assistant' and 'working atmosphere'. Within the stepwise linear regression analysis the intrinsic factor 'opportunity to use abilities' (β = 0.687) showed the highest score of explained variance (R(2) = 0.468) regarding overall job satisfaction. With respect to the Two-Factor Theory of job satisfaction both components, intrinsic and extrinsic, are essential for dentists but the presence of intrinsic motivating factors like the opportunity to use abilities has most positive impact on job satisfaction. The findings of this study will be helpful for further activities to improve the working conditions of dentists and to ensure quality of care. © 2012 John Wiley & Sons A/S.
NASA Astrophysics Data System (ADS)
Hoffman, A.; Forest, C. E.; Kemanian, A.
2016-12-01
A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
Lin, Lei; Wang, Qian; Sadek, Adel W
2016-06-01
The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.
Application of General Regression Neural Network to the Prediction of LOD Change
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Hong; Wang, Qi-Jie; Zhu, Jian-Jun; Zhang, Hao
2012-01-01
Traditional methods for predicting the change in length of day (LOD change) are mainly based on some linear models, such as the least square model and autoregression model, etc. However, the LOD change comprises complicated non-linear factors and the prediction effect of the linear models is always not so ideal. Thus, a kind of non-linear neural network — general regression neural network (GRNN) model is tried to make the prediction of the LOD change and the result is compared with the predicted results obtained by taking advantage of the BP (back propagation) neural network model and other models. The comparison result shows that the application of the GRNN to the prediction of the LOD change is highly effective and feasible.
Wong, William W.; Strizich, Garrett; Heo, Moonseong; Heymsfield, Steven B.; Himes, John H.; Rock, Cheryl L.; Gellman, Marc D.; Siega-Riz, Anna Maria; Sotres-Alvarez, Daniela; Davis, Sonia M.; Arredondo, Elva M.; Van Horn, Linda; Wylie-Rosett, Judith; Sanchez-Johnsen, Lisa; Kaplan, Robert; Mossavar-Rahmani, Yasmin
2016-01-01
Objective To evaluate the percentage of body fat (%BF)-BMI relationship, identify %BF levels corresponding to adult BMI cut-points, and examine %BF-BMI agreement in a diverse Hispanic/Latino population. Methods %BF by bioelectrical impedance analysis (BIA) was corrected against %BF by 18O dilution in 476 participants of the ancillary Hispanic Community Health/Latinos Studies. Corrected %BF were regressed against 1/BMI in the parent study (n=15,261), fitting models for each age group, by sex and Hispanic/Latino background; predicted %BF was then computed for each BMI cut-point. Results BIA underestimated %BF by 8.7 ± 0.3% in women and 4.6 ± 0.3% in men (P < 0.0001). The %BF-BMI relationshp was non-linear and linear for 1/BMI. Sex- and age-specific regression parameters between %BF and 1/BMI were consistent across Hispanic/Latino backgrounds (P > 0.05). The precision of the %BF-1/BMI association weakened with increasing age in men but not women. The proportion of participants classified as non-obese by BMI but obese by %BF was generally higher among women and older adults (16.4% in women vs. 12.0% in men aged 50-74 y). Conclusions %BF was linearly related to 1/BMI with consistent relationship across Hispanic/Lation backgrounds. BMI cut-points consistently underestimated the proportion of Hispanics/Latinos with excess adiposity. PMID:27184359
Regression Analysis of Top of Descent Location for Idle-thrust Descents
NASA Technical Reports Server (NTRS)
Stell, Laurel; Bronsvoort, Jesper; McDonald, Greg
2013-01-01
In this paper, multiple regression analysis is used to model the top of descent (TOD) location of user-preferred descent trajectories computed by the flight management system (FMS) on over 1000 commercial flights into Melbourne, Australia. The independent variables cruise altitude, final altitude, cruise Mach, descent speed, wind, and engine type were also recorded or computed post-operations. Both first-order and second-order models are considered, where cross-validation, hypothesis testing, and additional analysis are used to compare models. This identifies the models that should give the smallest errors if used to predict TOD location for new data in the future. A model that is linear in TOD altitude, final altitude, descent speed, and wind gives an estimated standard deviation of 3.9 nmi for TOD location given the trajec- tory parameters, which means about 80% of predictions would have error less than 5 nmi in absolute value. This accuracy is better than demonstrated by other ground automation predictions using kinetic models. Furthermore, this approach would enable online learning of the model. Additional data or further knowl- edge of algorithms is necessary to conclude definitively that no second-order terms are appropriate. Possible applications of the linear model are described, including enabling arriving aircraft to fly optimized descents computed by the FMS even in congested airspace. In particular, a model for TOD location that is linear in the independent variables would enable decision support tool human-machine interfaces for which a kinetic approach would be computationally too slow.
Estimating effects of limiting factors with regression quantiles
Cade, B.S.; Terrell, J.W.; Schroeder, R.L.
1999-01-01
In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e.g., 90-95th) were greater than if effects were estimated by changes in the means from standard linear model procedures. Estimating a range of regression quantiles (e.g., 5-95th) provides a comprehensive description of biological response patterns for exploratory and inferential analyses in observational studies of limiting factors, especially when sampling large spatial and temporal scales.
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Spacecraft platform cost estimating relationships
NASA Technical Reports Server (NTRS)
Gruhl, W. M.
1972-01-01
The three main cost areas of unmanned satellite development are discussed. The areas are identified as: (1) the spacecraft platform (SCP), (2) the payload or experiments, and (3) the postlaunch ground equipment and operations. The SCP normally accounts for over half of the total project cost and accurate estimates of SCP costs are required early in project planning as a basis for determining total project budget requirements. The development of single formula SCP cost estimating relationships (CER) from readily available data by statistical linear regression analysis is described. The advantages of single formula CER are presented.
A new approach to assess COPD by identifying lung function break-points
Eriksson, Göran; Jarenbäck, Linnea; Peterson, Stefan; Ankerst, Jaro; Bjermer, Leif; Tufvesson, Ellen
2015-01-01
Purpose COPD is a progressive disease, which can take different routes, leading to great heterogeneity. The aim of the post-hoc analysis reported here was to perform continuous analyses of advanced lung function measurements, using linear and nonlinear regressions. Patients and methods Fifty-one COPD patients with mild to very severe disease (Global Initiative for Chronic Obstructive Lung Disease [GOLD] Stages I–IV) and 41 healthy smokers were investigated post-bronchodilation by flow-volume spirometry, body plethysmography, diffusion capacity testing, and impulse oscillometry. The relationship between COPD severity, based on forced expiratory volume in 1 second (FEV1), and different lung function parameters was analyzed by flexible nonparametric method, linear regression, and segmented linear regression with break-points. Results Most lung function parameters were nonlinear in relation to spirometric severity. Parameters related to volume (residual volume, functional residual capacity, total lung capacity, diffusion capacity [diffusion capacity of the lung for carbon monoxide], diffusion capacity of the lung for carbon monoxide/alveolar volume) and reactance (reactance area and reactance at 5Hz) were segmented with break-points at 60%–70% of FEV1. FEV1/forced vital capacity (FVC) and resonance frequency had break-points around 80% of FEV1, while many resistance parameters had break-points below 40%. The slopes in percent predicted differed; resistance at 5 Hz minus resistance at 20 Hz had a linear slope change of −5.3 per unit FEV1, while residual volume had no slope change above and −3.3 change per unit FEV1 below its break-point of 61%. Conclusion Continuous analyses of different lung function parameters over the spirometric COPD severity range gave valuable information additional to categorical analyses. Parameters related to volume, diffusion capacity, and reactance showed break-points around 65% of FEV1, indicating that air trapping starts to dominate in moderate COPD (FEV1 =50%–80%). This may have an impact on the patient’s management plan and selection of patients and/or outcomes in clinical research. PMID:26508849
Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc
2015-08-01
The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
40 CFR 1066.220 - Linearity verification for chassis dynamometer systems.
Code of Federal Regulations, 2014 CFR
2014-07-01
... dynamometer speed and torque at least as frequently as indicated in Table 1 of § 1066.215. The intent of... linear regression and the linearity criteria specified in Table 1 of this section. (b) Performance requirements. If a measurement system does not meet the applicable linearity criteria in Table 1 of this...
ERIC Educational Resources Information Center
Hovardas, Tasos
2016-01-01
Although ecological systems at varying scales involve non-linear interactions, learners insist thinking in a linear fashion when they deal with ecological phenomena. The overall objective of the present contribution was to propose a hypothetical learning progression for developing non-linear reasoning in prey-predator systems and to provide…
ERIC Educational Resources Information Center
Ker, H. W.
2014-01-01
Multilevel data are very common in educational research. Hierarchical linear models/linear mixed-effects models (HLMs/LMEs) are often utilized to analyze multilevel data nowadays. This paper discusses the problems of utilizing ordinary regressions for modeling multilevel educational data, compare the data analytic results from three regression…
Grandke, Fabian; Singh, Priyanka; Heuven, Henri C M; de Haan, Jorn R; Metzler, Dirk
2016-08-24
Association studies are an essential part of modern plant breeding, but are limited for polyploid crops. The increased number of possible genotype classes complicates the differentiation between them. Available methods are limited with respect to the ploidy level or data producing technologies. While genotype classification is an established noise reduction step in diploids, it gains complexity with increasing ploidy levels. Eventually, the errors produced by misclassifications exceed the benefits of genotype classes. Alternatively, continuous genotype values can be used for association analysis in higher polyploids. We associated continuous genotypes to three different traits and compared the results to the output of the genotype caller SuperMASSA. Linear, Bayesian and partial least squares regression were applied, to determine if the use of continuous genotypes is limited to a specific method. A disease, a flowering and a growth trait with h (2) of 0.51, 0.78 and 0.91 were associated with a hexaploid chrysanthemum genotypes. The data set consisted of 55,825 probes and 228 samples. We were able to detect associating probes using continuous genotypes for multiple traits, using different regression methods. The identified probe sets were overlapping, but not identical between the methods. Baysian regression was the most restrictive method, resulting in ten probes for one trait and none for the others. Linear and partial least squares regression led to numerous associating probes. Association based on genotype classes resulted in similar values, but missed several significant probes. A simulation study was used to successfully validate the number of associating markers. Association of various phenotypic traits with continuous genotypes is successful with both uni- and multivariate regression methods. Genotype calling does not improve the association and shows no advantages in this study. Instead, use of continuous genotypes simplifies the analysis, saves computational time and results more potential markers.
Artes, Paul H; Crabb, David P
2010-01-01
To investigate why the specificity of the Moorfields Regression Analysis (MRA) of the Heidelberg Retina Tomograph (HRT) varies with disc size, and to derive accurate normative limits for neuroretinal rim area to address this problem. Two datasets from healthy subjects (Manchester, UK, n = 88; Halifax, Nova Scotia, Canada, n = 75) were used to investigate the physiological relationship between the optic disc and neuroretinal rim area. Normative limits for rim area were derived by quantile regression (QR) and compared with those of the MRA (derived by linear regression). Logistic regression analyses were performed to quantify the association between disc size and positive classifications with the MRA, as well as with the QR-derived normative limits. In both datasets, the specificity of the MRA depended on optic disc size. The odds of observing a borderline or outside-normal-limits classification increased by approximately 10% for each 0.1 mm(2) increase in disc area (P < 0.1). The lower specificity of the MRA with large optic discs could be explained by the failure of linear regression to model the extremes of the rim area distribution (observations far from the mean). In comparison, the normative limits predicted by QR were larger for smaller discs (less specific, more sensitive), and smaller for larger discs, such that false-positive rates became independent of optic disc size. Normative limits derived by quantile regression appear to remove the size-dependence of specificity with the MRA. Because quantile regression does not rely on the restrictive assumptions of standard linear regression, it may be a more appropriate method for establishing normative limits in other clinical applications where the underlying distributions are nonnormal or have nonconstant variance.
Eshriqui, Ilana; Vilela, Ana Amélia Freitas; Rebelo, Fernanda; Farias, Dayana Rodrigues; Castro, Maria Beatriz Trindade; Kac, Gilberto
2016-02-01
To identify gestational dietary patterns and evaluate the association between these patterns and the blood pressure (BP) rate of change during pregnancy and the postpartum. Prospective cohort study composed of 191 healthy pregnant women. Systolic BP (SBP) and diastolic BP (DBP) were obtained at the 5th-13th, 20th-26th, 30th-36th gestational weeks, and with 30-45 days postpartum. A food frequency questionnaire administered at the 30th-36th gestational week was used to measure dietary intake during pregnancy. Principal component analysis was performed to identify the dietary patterns. A longitudinal linear mixed-effects regression model was used to evaluate the association between the dietary patterns and BP (adjusted for time elapsed after conception and the women's age, education, parity, body mass index and total energy intake). Three gestational dietary patterns were identified: healthy, common-Brazilian and processed. SBP/DBP mean values (SD) were 110.1 (9.0)/66.9 (7.5), 108.7 (9.0)/64.9 (6.7), 111.3 (9.2)/67.0 (6.9) and 115.0 (10.7)/73.7 (8.6) mmHg at the first, second and third gestational trimesters and postpartum, respectively. Women with higher/lower adherence to the processed pattern presented SBP of 117.9 and 113.0 mmHg (P = 0.037), respectively, during postpartum. No association was found between any of the three dietary patterns and SBP in the multiple longitudinal linear regression models, whereas 1 SD increase in the common-Brazilian pattern was associated with a small change of DBP (β = 0.0006; 95% CI 4.66e-06, 0.001; P = 0.048). The three dietary patterns identified revealed no association with changes of SBP and DBP levels during pregnancy and at early postpartum in this sample of healthy Brazilian women.
Shimizu, Takamasa; Omokawa, Shohei; Akahane, Manabu; Murata, Keiichi; Nakano, Kenichi; Kawamura, Kenji; Tanaka, Yasuhito
2012-06-01
Plate and screw fixation was introduced for complex fractures of the hand. Several risk factors for a poor functional outcome have been identified, but there is a paucity of evidence regarding predictors of finger stiffness in difficult hand fractures. The purpose of this prospective cohort study was to identify independent prognostic factors of the postoperative total active motion (%TAM) in the treatment of metacarpal and phalangeal fractures. Seventy-two patients (62 males, 10 females; 37±15 years) with periarticular fractures involving metaphyseal comminution and displacement were evaluated at a minimum of 1 year following surgery. There were 49 phalangeal bone fractures, 30 intra-articular fractures and 20 associated soft-tissue injuries. The locations of plate placement were lateral in 42 patients and dorsal in 30. The mean duration from injury to surgery was 7.6 days (range, 0-40 days). There were eight examined variables related to patient characteristics (age, gender and hand dominance), fracture characteristics (fracture location, joint involvement and associated soft-tissue injury) and surgical variables (location of plate placement and duration from injury to surgery). Univariate and multivariate linear regression analysis were used to identify the degree to which variables affect %TAM at the final follow-up. Univariate analysis indicated moderate correlations of %TAM with fracture location, associated soft-tissue injury and age. Multiple linear regression modelling including fracture location, age and associated soft-tissue injury resulted in formulae that could account for 46.3% of the variability in %TAM: fracture location (β=-0.388, p<0.001), age (β=-0.339, p<0.001) and associated soft-tissue injury (β=-0.296, p=0.002). Phalangeal fracture, increasing age and associated soft-tissue injury were important risk factors to identify the postoperative %TAM in the treatment of comminuted periarticular metacarpal or phalangeal fracture with a titanium plate. Copyright © 2012 Elsevier Ltd. All rights reserved.
Selenium Exposure and Cancer Risk: an Updated Meta-analysis and Meta-regression
Cai, Xianlei; Wang, Chen; Yu, Wanqi; Fan, Wenjie; Wang, Shan; Shen, Ning; Wu, Pengcheng; Li, Xiuyang; Wang, Fudi
2016-01-01
The objective of this study was to investigate the associations between selenium exposure and cancer risk. We identified 69 studies and applied meta-analysis, meta-regression and dose-response analysis to obtain available evidence. The results indicated that high selenium exposure had a protective effect on cancer risk (pooled OR = 0.78; 95%CI: 0.73–0.83). The results of linear and nonlinear dose-response analysis indicated that high serum/plasma selenium and toenail selenium had the efficacy on cancer prevention. However, we did not find a protective efficacy of selenium supplement. High selenium exposure may have different effects on specific types of cancer. It decreased the risk of breast cancer, lung cancer, esophageal cancer, gastric cancer, and prostate cancer, but it was not associated with colorectal cancer, bladder cancer, and skin cancer. PMID:26786590
Estimating the effects of wages on obesity.
Kim, DaeHwan; Leigh, John Paul
2010-05-01
To estimate the effects of wages on obesity and body mass. Data on household heads, aged 20 to 65 years, with full-time jobs, were drawn from the Panel Study of Income Dynamics for 2003 to 2007. The Panel Study of Income Dynamics is a nationally representative sample. Instrumental variables (IV) for wages were created using knowledge of computer software and state legal minimum wages. Least squares (linear regression) with corrected standard errors were used to estimate the equations. Statistical tests revealed both instruments were strong and tests for over-identifying restrictions were favorable. Wages were found to be predictive (P < 0.05) of obesity and body mass in regressions both before and after applying IVs. Coefficient estimates suggested stronger effects in the IV models. Results are consistent with the hypothesis that low wages increase obesity prevalence and body mass.
Kennen, Jonathan G.; Ayers, Mark A.
2002-01-01
Community data from 36 watersheds were used to evaluate the response of fish, invertebrate, and algal assemblages in New Jersey streams to environmental characteristics along a gradient of urban land use that ranged from 3 to 96 percent. Aquatic assemblages were sampled at 36 sites during 1996-98, and more than 400 environmental attributes at multiple spatial scales were summarized. Data matrices were reduced to 43, 170, and 103 species of fish, invertebrates, and algae, respectively, by means of a predetermined joint frequency and relative abundance approach. White sucker (Catostomus commersoni) and Tessellated darter (Etheostoma olmstedi) were the most abundant fishes, accounting for more than 20 and 17 percent, respectively, of the mean abundance. Net-spinning caddisflies (Hydropsychidae) were the most commonly occurring benthic invertebrates and were found at all but one of the 36 sampling sites. Blue-green (for example, Calothrix sp. and Oscillatoria sp.) and green (for example, Protoderma viride) algae were the most widely distrib-uted algae; however, more than 81 percent of the algal taxa collected were diatoms. Principal-component and correlation analyses were used to reduce the dimensionality of the environmental data. Multiple linear regression analysis of extracted ordination axes then was used to develop models that expressed effects of increasing urban land use on the structure of aquatic assemblages. Significant environmental variables identified by using multiple linear regression analysis then were included in a direct gradient analysis. Partial canonical correspondence analysis of relativized abundance data was used to restrict further the effects of residual natural variability, and to identify relations among the environmental variables and the structure of fish, invertebrate, and algal assemblages along an urban land-use gradient. Results of this approach, combined with the results of the multiple linear regression analyses, were used to identify human population density (311-37,594 persons/km2), amount and type of impervious surface cover (0.12-1,350 km2), nutrient concentrations (for example, 0.01-0.29 mg/L of phosphorus), hydrologic instability (for example, 100-8,955 ft3/s for 2-year peak flow), the amount of forest and wetlands in a basin (0.01-6.25 km2), and substrate quality (0-87 percent cobble substrate) as variables that are highly correlated with aquatic-assemblage structure. Species distributions in ordination space clearly indicate that tolerant species are more abundant in the streams impaired by urbanization and sensitive taxa are more closely associated with the least impaired basins. The distinct differences in aquatic assemblages along the urban land-use gradient demonstrate the deleterious effects of urbanization on assemblage structure and indicate that conserving landscape attributes that mitigate anthropogenic influences (for example, stormwater-management practices emphasizing infiltration and preservation of existing forests, wetlands, and riparian corridors) will help to maintain the relative abundance of sensitive taxa. Complementary multiple linear regression models indicate that aquatic community indices were correlated with many of the anthropogenic factors that were found to be significant along the urban land-use gradient. These indices appear to be effective in differentiating the moderately and severely impaired streams from the minimally impaired streams. Evaluation of disturbance thresholds for aquatic assemblages indicates that moderate to severe impairment is detectable in New Jersey streams when impervious surface cover in the drainage basin reaches approximately 18 percent.
Association of Frontal and Lateral Facial Attractiveness.
Gu, Jeffrey T; Avilla, David; Devcic, Zlatko; Karimi, Koohyar; Wong, Brian J F
2018-01-01
Despite the large number of studies focused on defining frontal or lateral facial attractiveness, no reports have examined whether a significant association between frontal and lateral facial attractiveness exists. To examine the association between frontal and lateral facial attractiveness and to identify anatomical features that may influence discordance between frontal and lateral facial beauty. Paired frontal and lateral facial synthetic images of 240 white women (age range, 18-25 years) were evaluated from September 30, 2004, to September 29, 2008, using an internet-based focus group (n = 600) on an attractiveness Likert scale of 1 to 10, with 1 being least attractive and 10 being most attractive. Data analysis was performed from December 6, 2016, to March 30, 2017. The association between frontal and lateral attractiveness scores was determined using linear regression. Outliers were defined as data outside the 95% individual prediction interval. To identify features that contribute to score discordance between frontal and lateral attractiveness scores, each of these image pairs were scrutinized by an evaluator panel for facial features that were present in the frontal or lateral projections and absent in the other respective facial projections. Attractiveness scores obtained from internet-based focus groups. For the 240 white women studied (mean [SD] age, 21.4 [2.2] years), attractiveness scores ranged from 3.4 to 9.5 for frontal images and 3.3 to 9.4 for lateral images. The mean (SD) frontal attractiveness score was 6.9 (1.4), whereas the mean (SD) lateral attractiveness score was 6.4 (1.3). Simple linear regression of frontal and lateral attractiveness scores resulted in a coefficient of determination of r2 = 0.749. Eight outlier pairs were identified and analyzed by panel evaluation. Panel evaluation revealed no clinically applicable association between frontal and lateral images among outliers; however, contributory facial features were suggested. Thin upper lip, convex nose, and blunt cervicomental angle were suggested by evaluators as facial characteristics that contributed to outlier frontal or lateral attractiveness scores. This study identified a strong linear association between frontal and lateral facial attractiveness. Furthermore, specific facial landmarks responsible for the discordance between frontal and lateral facial attractiveness scores were suggested. Additional studies are necessary to determine whether correction of these landmarks may increase facial harmony and attractiveness. NA.
[The Visual Association Test to study episodic memory in clinical geriatric psychology].
Diesfeldt, Han; Prins, Marleen; Lauret, Gijs
2018-04-01
The Visual Association Test (VAT) is a brief learning task that consists of six line drawings of pairs of interacting objects (association cards). Subjects are asked to name or identify each object and later are presented with one object from the pair (the cue) and asked to name the other (the target). The VAT was administered in a consecutive sample of 174 psychogeriatric day care participants with mild to major neurocognitive disorder. Comparison of test performance with normative data from non-demented subjects revealed that 69% scored within the range of a major deficit (0-8 over two recall trials), 14% a minor, and 17% no deficit (9-10, and ≥10 respectively).VAT-scores correlated with another test of memory function, the Cognitive Screening Test (CST), based on the Short Portable Mental Status Questionnaire (r = 0.53). Tests of executive functioning (Expanded Mental Control Test, Category Fluency, Clock Drawing) did not add significantly to the explanation of variance in VAT-scores.Fifty-five participants (31.6%) were faced with initial problems in naming or identifying one or more objects on the cue cards or association cards. If necessary, naming was aided by the investigator. Initial difficulties in identifying cue objects were associated with lower VAT-scores, but this did not hold for difficulties in identifying target objects.A hierarchical multiple regression analysis was used to examine whether linear or quadratic trends best fitted VAT performance across the range of CST scores. The regression model revealed a linear but not a quadratic trend. The best fitting linear model implied that VAT scores differentiated between CST scores in the lower, as well as in the upper range, indicating the absence of floor and ceiling effects, respectively. Moreover, the VAT compares favourably to word list-learning tasks being more attractive in its presentation of interacting visual objects and cued recall based on incidental learning of the association between cues and targets.For practical purposes and based on documented sensitivity and specificity, Bayesian probability tables give predictive power of age-specific VAT cutoff scores for the presence or absence of a major neurocognitive disorder across a range of a priori probabilities or base rates.
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Mental chronometry with simple linear regression.
Chen, J Y
1997-10-01
Typically, mental chronometry is performed by means of introducing an independent variable postulated to affect selectively some stage of a presumed multistage process. However, the effect could be a global one that spreads proportionally over all stages of the process. Currently, there is no method to test this possibility although simple linear regression might serve the purpose. In the present study, the regression approach was tested with tasks (memory scanning and mental rotation) that involved a selective effect and with a task (word superiority effect) that involved a global effect, by the dominant theories. The results indicate (1) the manipulation of the size of a memory set or of angular disparity affects the intercept of the regression function that relates the times for memory scanning with different set sizes or for mental rotation with different angular disparities and (2) the manipulation of context affects the slope of the regression function that relates the times for detecting a target character under word and nonword conditions. These ratify the regression approach as a useful method for doing mental chronometry.
Ebhuoma, Osadolor; Gebreslasie, Michael
2016-06-14
Malaria is a serious public health threat in Sub-Saharan Africa (SSA), and its transmission risk varies geographically. Modelling its geographic characteristics is essential for identifying the spatial and temporal risk of malaria transmission. Remote sensing (RS) has been serving as an important tool in providing and assessing a variety of potential climatic/environmental malaria transmission variables in diverse areas. This review focuses on the utilization of RS-driven climatic/environmental variables in determining malaria transmission in SSA. A systematic search on Google Scholar and the Institute for Scientific Information (ISI) Web of Knowledge(SM) databases (PubMed, Web of Science and ScienceDirect) was carried out. We identified thirty-five peer-reviewed articles that studied the relationship between remotely-sensed climatic variable(s) and malaria epidemiological data in the SSA sub-regions. The relationship between malaria disease and different climatic/environmental proxies was examined using different statistical methods. Across the SSA sub-region, the normalized difference vegetation index (NDVI) derived from either the National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) or Moderate-resolution Imaging Spectrometer (MODIS) satellite sensors was most frequently returned as a statistically-significant variable to model both spatial and temporal malaria transmission. Furthermore, generalized linear models (linear regression, logistic regression and Poisson regression) were the most frequently-employed methods of statistical analysis in determining malaria transmission predictors in East, Southern and West Africa. By contrast, multivariate analysis was used in Central Africa. We stress that the utilization of RS in determining reliable malaria transmission predictors and climatic/environmental monitoring variables would require a tailored approach that will have cognizance of the geographical/climatic setting, the stage of malaria elimination continuum, the characteristics of the RS variables and the analytical approach, which in turn, would support the channeling of intervention resources sustainably.
Ouidir, Marion; Lepeule, Johanna; Siroux, Valérie; Malherbe, Laure; Meleux, Frederik; Rivière, Emmanuel; Launay, Ludivine; Zaros, Cécile; Cheminat, Marie; Charles, Marie-Aline; Slama, Rémy
2017-10-01
Exposure to atmospheric pollutants is a danger for the health of pregnant mother and children. Our objective was to identify individual (socioeconomic and behavioural) and contextual factors associated with atmospheric pollution pregnancy exposure at the nationwide level. Among 14 921 women from the French nationwide ELFE (French Longitudinal Study of Children) mother-child cohort recruited in 2011, outdoor exposure levels of PM 2.5 , PM 10 (particulate matter <2.5 µm and <10 µm in diameter) and NO 2 (nitrogen dioxide) were estimated at the pregnancy home address from a dispersion model with 1 km resolution. We used classification and regression trees (CART) and linear regression to characterise the association of atmospheric pollutants with individual (maternal age, body mass index, parity, education level, relationship status, smoking status) and contextual (European Deprivation Index, urbanisation level) factors. Patterns of associations were globally similar across pollutants. For the CART approach, the highest tertile of exposure included mainly women not in a relationship living in urban and socially deprived areas, with lower education level. Linear regression models identified different determinants of atmospheric pollutants exposure according to the residential urbanisation level. In urban areas, atmospheric pollutants exposure increased with social deprivation, while in rural areas a U-shaped relationship was observed. We highlighted social inequalities in atmospheric pollutants exposure according to contextual characteristics such as urbanisation level and social deprivation and also according to individual characteristics such as education, being in a relationship and smoking status. In French urban areas, pregnant women from the most deprived neighbourhoods were those most exposed to health-threatening atmospheric pollutants. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Iqbal, Asif; Kim, You-Sam; Kang, Jun-Mo; Lee, Yun-Mi; Rai, Rajani; Jung, Jong-Hyun; Oh, Dong-Yup; Nam, Ki-Chang; Lee, Hak-Kyo; Kim, Jong-Joo
2015-01-01
Meat and carcass quality attributes are of crucial importance influencing consumer preference and profitability in the pork industry. A set of 400 Berkshire pigs were collected from Dasan breeding farm, Namwon, Chonbuk province, Korea that were born between 2012 and 2013. To perform genome wide association studies (GWAS), eleven meat and carcass quality traits were considered, including carcass weight, backfat thickness, pH value after 24 hours (pH24), Commission Internationale de l’Eclairage lightness in meat color (CIE L), redness in meat color (CIE a), yellowness in meat color (CIE b), filtering, drip loss, heat loss, shear force and marbling score. All of the 400 animals were genotyped with the Porcine 62K SNP BeadChips (Illumina Inc., USA). A SAS general linear model procedure (SAS version 9.2) was used to pre-adjust the animal phenotypes before GWAS with sire and sex effects as fixed effects and slaughter age as a covariate. After fitting the fixed and covariate factors in the model, the residuals of the phenotype regressed on additive effects of each single nucleotide polymorphism (SNP) under a linear regression model (PLINK version 1.07). The significant SNPs after permutation testing at a chromosome-wise level were subjected to stepwise regression analysis to determine the best set of SNP markers. A total of 55 significant (p<0.05) SNPs or quantitative trait loci (QTL) were detected on various chromosomes. The QTLs explained from 5.06% to 8.28% of the total phenotypic variation of the traits. Some QTLs with pleiotropic effect were also identified. A pair of significant QTL for pH24 was also found to affect both CIE L and drip loss percentage. The significant QTL after characterization of the functional candidate genes on the QTL or around the QTL region may be effectively and efficiently used in marker assisted selection to achieve enhanced genetic improvement of the trait considered. PMID:26580276
Lam, Virginie; Dhaliwal, Satvinder S; Mamo, John C
2013-05-01
Ionized calcium (iCa) is the biologically active form of this micronutrient. Serum determination of iCa is measured via ion-electrode potentiometry (IEP) and reporting iCa relative to pH 7.4 is normally utilized to avoid the potential confounding effects of ex vivo changes to serum pH. Adjustment of iCa for pH has not been adequately justified. In this study, utilizing carefully standardized protocols for blood collection, the preparation of serum and controlling time of collection-to-analysis, we determined serum iCa and pH utilizing an IEP-analyser hosted at an accredited diagnostic laboratory. Regression analysis of unadjusted-iCa (iCa(raw)) concentration versus pH was described by linear regression and accounted for 37% of serum iCa(raw) variability. iCa(raw) was then expressed at pH 7.4 by either adjusting iCa(raw) based on the linear regression equation describing the association of iCa with serum pH (iCa(regr)) or using IEP coded published normative equations (iCa(pub)). iCa(regr) was comparable to iCa(raw), indicating that blood collection and processing methodologies were sound. However, iCa(pub) yielded values that were significantly lower than iCa(raw). iCa(pub) did not identify 15% subjects who had greater than desirable serum concentration of iCa based on iCa(raw). Sixty percent of subjects with low levels of iCa(raw) were also not detected by iCa(pub). Determination of the kappa value measure of agreement for iCa(raw) versus iCa(pub) showed relatively poor concordance (κ = 0.42). With simple protocols that avoid sampling artefacts, expressing iCa(raw) is likely to be a more valid and physiologically relevant marker of calcium homeostasis than is iCa(pub).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Viani, Gustavo Arruda; Stefano, Eduardo Jose; Afonso, Sergio Luis
2009-08-01
Purpose: To determine in a meta-analysis whether the outcomes in men with localized prostate cancer treated with high-dose radiotherapy (HDRT) are better than those in men treated with conventional-dose radiotherapy (CDRT), by quantifying the effect of the total dose of radiotherapy on biochemical control (BC). Methods and Materials: The MEDLINE, EMBASE, CANCERLIT, and Cochrane Library databases, as well as the proceedings of annual meetings, were systematically searched to identify randomized, controlled studies comparing HDRT with CDRT for localized prostate cancer. To evaluate the dose-response relationship, we conducted a meta-regression analysis of BC ratios by means of weighted linear regression. Results:more » Seven RCTs with a total patient population of 2812 were identified that met the study criteria. Pooled results from these RCTs showed a significant reduction in the incidence of biochemical failure in those patients with prostate cancer treated with HDRT (p < 0.0001). However, there was no difference in the mortality rate (p = 0.38) and specific prostate cancer mortality rates (p = 0.45) between the groups receiving HDRT and CDRT. However, there were more cases of late Grade >2 gastrointestinal toxicity after HDRT than after CDRT. In the subgroup analysis, patients classified as being at low (p = 0.007), intermediate (p < 0.0001), and high risk (p < 0.0001) of biochemical failure all showed a benefit from HDRT. The meta-regression analysis also detected a linear correlation between the total dose of radiotherapy and biochemical failure (BC = -67.3 + [1.8 x radiotherapy total dose in Gy]; p = 0.04). Conclusions: Our meta-analysis showed that HDRT is superior to CDRT in preventing biochemical failure in low-, intermediate-, and high-risk prostate cancer patients, suggesting that this should be offered as a treatment for all patients, regardless of their risk status.« less
Ebhuoma, Osadolor; Gebreslasie, Michael
2016-01-01
Malaria is a serious public health threat in Sub-Saharan Africa (SSA), and its transmission risk varies geographically. Modelling its geographic characteristics is essential for identifying the spatial and temporal risk of malaria transmission. Remote sensing (RS) has been serving as an important tool in providing and assessing a variety of potential climatic/environmental malaria transmission variables in diverse areas. This review focuses on the utilization of RS-driven climatic/environmental variables in determining malaria transmission in SSA. A systematic search on Google Scholar and the Institute for Scientific Information (ISI) Web of KnowledgeSM databases (PubMed, Web of Science and ScienceDirect) was carried out. We identified thirty-five peer-reviewed articles that studied the relationship between remotely-sensed climatic variable(s) and malaria epidemiological data in the SSA sub-regions. The relationship between malaria disease and different climatic/environmental proxies was examined using different statistical methods. Across the SSA sub-region, the normalized difference vegetation index (NDVI) derived from either the National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) or Moderate-resolution Imaging Spectrometer (MODIS) satellite sensors was most frequently returned as a statistically-significant variable to model both spatial and temporal malaria transmission. Furthermore, generalized linear models (linear regression, logistic regression and Poisson regression) were the most frequently-employed methods of statistical analysis in determining malaria transmission predictors in East, Southern and West Africa. By contrast, multivariate analysis was used in Central Africa. We stress that the utilization of RS in determining reliable malaria transmission predictors and climatic/environmental monitoring variables would require a tailored approach that will have cognizance of the geographical/climatic setting, the stage of malaria elimination continuum, the characteristics of the RS variables and the analytical approach, which in turn, would support the channeling of intervention resources sustainably. PMID:27314369
Bos-Touwen, Irene; Schuurmans, Marieke; Monninkhof, Evelyn M.; Korpershoek, Yvonne; Spruit-Bentvelzen, Lotte; Ertugrul-van der Graaf, Inge; de Wit, Niek; Trappenburg, Jaap
2015-01-01
A substantial proportion of chronic disease patients do not respond to self-management interventions, which suggests that one size interventions do not fit all, demanding more tailored interventions. To compose more individualized strategies, we aim to increase our understanding of characteristics associated with patient activation for self-management and to evaluate whether these are disease-transcending. A cross-sectional survey study was conducted in primary and secondary care in patients with type-2 Diabetes Mellitus (DM-II), Chronic Obstructive Pulmonary Disease (COPD), Chronic Heart Failure (CHF) and Chronic Renal Disease (CRD). Using multiple linear regression analysis, we analyzed associations between self-management activation (13-item Patient Activation Measure; PAM-13) and a wide range of socio-demographic, clinical, and psychosocial determinants. Furthermore, we assessed whether the associations between the determinants and the PAM were disease-transcending by testing whether disease was an effect modifier. In addition, we identified determinants associated with low activation for self-management using logistic regression analysis. We included 1154 patients (53% response rate); 422 DM-II patients, 290 COPD patients, 223 HF patients and 219 CRD patients. Mean age was 69.6±10.9. Multiple linear regression analysis revealed 9 explanatory determinants of activation for self-management: age, BMI, educational level, financial distress, physical health status, depression, illness perception, social support and underlying disease, explaining a variance of 16.3%. All associations, except for social support, were disease transcending. This study explored factors associated with varying levels of activation for self-management. These results are a first step in supporting clinicians and researchers to identify subpopulations of chronic disease patients less likely to be engaged in self-management. Increased scientific efforts are needed to explain the greater part of the factors that contribute to the complex nature of patient activation for self-management. PMID:25950517
Merkel, C; Gatta, A; Bellumat, A; Bolognesi, M; Borsato, L; Caregaro, L; Cavallarin, G; Cielo, R; Cristina, P; Cucci, E; Donada, C; Donadon, V; Enzo, E; Martin, R; Mazzaro, C; Sacerdoti, D; Torboli, P
1996-01-01
To identify the best time-frame for defining bleeding-related death after variceal bleeding in patients with cirrhosis. Prospective long-term evaluation of a cohort of 155 patients admitted with variceal bleeding. Eight medical departments in seven hospitals in north-eastern Italy. Non-linear regression analysis of a hazard curve for death, and Cox's multiple regression analyses using different zero-time points. Cumulative hazard plots gave two slopes, the first corresponding to the risk of death from acute bleeding, the second a baseline risk of death. The first 30 days were outside the confidence limits of the regression curve for the baseline risk of death. Using Cox's regression analysis, the significant predictors of overall mortality risk were balanced between factors related to severity of bleeding and those related to severity of liver disease. If only deaths occurring after 30 days were considered, only predictors related to the severity of liver disease were found to be of importance. Thirty days after bleeding is considered to be a reasonable time-frame for the definition of bleeding-related death in patients with cirrhosis and variceal bleeding.
Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression
Peng, Limin; Xu, Jinfeng; Kutner, Nancy
2013-01-01
Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. PMID:25332515
Guan, Yongtao; Li, Yehua; Sinha, Rajita
2011-01-01
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
Functional genomic Landscape of Human Breast Cancer drivers, vulnerabilities, and resistance
Marcotte, Richard; Sayad, Azin; Brown, Kevin R.; Sanchez-Garcia, Felix; Reimand, Jüri; Haider, Maliha; Virtanen, Carl; Bradner, James E.; Bader, Gary D.; Mills, Gordon B.; Pe’er, Dana; Moffat, Jason; Neel, Benjamin G.
2016-01-01
Summary Large-scale genomic studies have identified multiple somatic aberrations in breast cancer, including copy number alterations, and point mutations. Still, identifying causal variants and emergent vulnerabilities that arise as a consequence of genetic alterations remain major challenges. We performed whole genome shRNA “dropout screens” on 77 breast cancer cell lines. Using a hierarchical linear regression algorithm to score our screen results and integrate them with accompanying detailed genetic and proteomic information, we identify vulnerabilities in breast cancer, including candidate “drivers,” and reveal general functional genomic properties of cancer cells. Comparisons of gene essentiality with drug sensitivity data suggest potential resistance mechanisms, effects of existing anti-cancer drugs, and opportunities for combination therapy. Finally, we demonstrate the utility of this large dataset by identifying BRD4 as a potential target in luminal breast cancer, and PIK3CA mutations as a resistance determinant for BET-inhibitors. PMID:26771497
Pimperl, Alexander F; Rodriguez, Hector P; Schmittdiel, Julie A; Shortell, Stephen M
2018-06-01
To identify positive deviant (PD) physician organizations of Accountable Care Organizations (ACOs) with robust performance management systems (PMSYS). Third National Survey of Physician Organizations (NSPO3, n = 1,398). Organizational and external factors from NSPO3 were analyzed. Linear regression estimated the association of internal and contextual factors on PMSYS. Two cutpoints (75th/90th percentiles) identified PDs with the largest residuals and highest PMSYS scores. A total of 65 and 41 PDs were identified using 75th and 90th percentiles cutpoints, respectively. The 90th percentile more strongly differentiated PDs from non-PDs. Having a high proportion of vulnerable patients appears to constrain PMSYS development. Our PD identification method increases the likelihood that PD organizations selected for in-depth inquiry are high-performing organizations that exceed expectations. © Health Research and Educational Trust.
Comparison of Mental Health Treatment Adequacy and Costs in Public Hospitals in Boston and Madrid.
Carmona, Rodrigo; Cook, Benjamin Lê; Baca-García, Enrique; Chavez, Ligia; Alvarez, Kiara; Iza, Miren; Alegría, Margarita
2018-03-07
Analyses of healthcare expenditures and adequacy are needed to identify cost-effective policies and practices that improve mental healthcare quality. Data are from 2010 to 2012 electronic health records from three hospital psychiatry departments in Madrid (n = 29,944 person-years) and three in Boston (n = 14,109 person-years). Two-part multivariate generalized linear regression and logistic regression models were estimated to identify site differences in mental healthcare expenditures and quality of care. Annual total average treatment expenditures were $4442.14 in Boston and $2277.48 in Madrid. Boston patients used inpatient services more frequently and had higher 30-day re-admission rates (23.7 vs. 8.7%) despite higher rates of minimally adequate care (49.5 vs. 34.8%). Patients in Madrid were more likely to receive psychotropic medication, had fewer inpatient stays and readmissions, and had lower expenditures, but had lower rates of minimally adequate care. Differences in insurance and healthcare system policies and mental health professional roles may explain these dissimilarities.
Kim, Dae-Hee; Choi, Jae-Hun; Lim, Myung-Eun; Park, Soo-Jun
2008-01-01
This paper suggests the method of correcting distance between an ambient intelligence display and a user based on linear regression and smoothing method, by which distance information of a user who approaches to the display can he accurately output even in an unanticipated condition using a passive infrared VIR) sensor and an ultrasonic device. The developed system consists of an ambient intelligence display and an ultrasonic transmitter, and a sensor gateway. Each module communicates with each other through RF (Radio frequency) communication. The ambient intelligence display includes an ultrasonic receiver and a PIR sensor for motion detection. In particular, this system selects and processes algorithms such as smoothing or linear regression for current input data processing dynamically through judgment process that is determined using the previous reliable data stored in a queue. In addition, we implemented GUI software with JAVA for real time location tracking and an ambient intelligence display.
How is the weather? Forecasting inpatient glycemic control
Saulnier, George E; Castro, Janna C; Cook, Curtiss B; Thompson, Bithika M
2017-01-01
Aim: Apply methods of damped trend analysis to forecast inpatient glycemic control. Method: Observed and calculated point-of-care blood glucose data trends were determined over 62 weeks. Mean absolute percent error was used to calculate differences between observed and forecasted values. Comparisons were drawn between model results and linear regression forecasting. Results: The forecasted mean glucose trends observed during the first 24 and 48 weeks of projections compared favorably to the results provided by linear regression forecasting. However, in some scenarios, the damped trend method changed inferences compared with linear regression. In all scenarios, mean absolute percent error values remained below the 10% accepted by demand industries. Conclusion: Results indicate that forecasting methods historically applied within demand industries can project future inpatient glycemic control. Additional study is needed to determine if forecasting is useful in the analyses of other glucometric parameters and, if so, how to apply the techniques to quality improvement. PMID:29134125
Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G
2015-01-01
The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer’s disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM. PMID:26900412
Liquid electrolyte informatics using an exhaustive search with linear regression.
Sodeyama, Keitaro; Igarashi, Yasuhiko; Nakayama, Tomofumi; Tateyama, Yoshitaka; Okada, Masato
2018-06-14
Exploring new liquid electrolyte materials is a fundamental target for developing new high-performance lithium-ion batteries. In contrast to solid materials, disordered liquid solution properties have been less studied by data-driven information techniques. Here, we examined the estimation accuracy and efficiency of three information techniques, multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), and exhaustive search with linear regression (ES-LiR), by using coordination energy and melting point as test liquid properties. We then confirmed that ES-LiR gives the most accurate estimation among the techniques. We also found that ES-LiR can provide the relationship between the "prediction accuracy" and "calculation cost" of the properties via a weight diagram of descriptors. This technique makes it possible to choose the balance of the "accuracy" and "cost" when the search of a huge amount of new materials was carried out.
Neonatal MRI is associated with future cognition and academic achievement in preterm children
Spencer-Smith, Megan; Thompson, Deanne K.; Doyle, Lex W.; Inder, Terrie E.; Anderson, Peter J.; Klingberg, Torkel
2015-01-01
School-age children born preterm are particularly at risk for low mathematical achievement, associated with reduced working memory and number skills. Early identification of preterm children at risk for future impairments using brain markers might assist in referral for early intervention. This study aimed to examine the use of neonatal magnetic resonance imaging measures derived from automated methods (Jacobian maps from deformation-based morphometry; fractional anisotropy maps from diffusion tensor images) to predict skills important for mathematical achievement (working memory, early mathematical skills) at 5 and 7 years in a cohort of preterm children using both univariable (general linear model) and multivariable models (support vector regression). Participants were preterm children born <30 weeks’ gestational age and healthy control children born ≥37 weeks’ gestational age at the Royal Women’s Hospital in Melbourne, Australia between July 2001 and December 2003 and recruited into a prospective longitudinal cohort study. At term-equivalent age ( ±2 weeks) 224 preterm and 46 control infants were recruited for magnetic resonance imaging. Working memory and early mathematics skills were assessed at 5 years (n = 195 preterm; n = 40 controls) and 7 years (n = 197 preterm; n = 43 controls). In the preterm group, results identified localized regions around the insula and putamen in the neonatal Jacobian map that were positively associated with early mathematics at 5 and 7 years (both P < 0.05), even after covarying for important perinatal clinical factors using general linear model but not support vector regression. The neonatal Jacobian map showed the same trend for association with working memory at 7 years (models ranging from P = 0.07 to P = 0.05). Neonatal fractional anisotropy was positively associated with working memory and early mathematics at 5 years (both P < 0.001) even after covarying for clinical factors using support vector regression but not general linear model. These significant relationships were not observed in the control group. In summary, we identified, in the preterm brain, regions around the insula and putamen using neonatal deformation-based morphometry, and brain microstructural organization using neonatal diffusion tensor imaging, associated with skills important for childhood mathematical achievement. Results contribute to the growing evidence for the clinical utility of neonatal magnetic resonance imaging for early identification of preterm infants at risk for childhood cognitive and academic impairment. PMID:26329284
Huang, Jian; Zhang, Cun-Hui
2013-01-01
The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2014-06-01
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression.
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2014-01-01
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression. PMID:25598560
Biostatistics Series Module 10: Brief Overview of Multivariate Methods.
Hazra, Avijit; Gogtay, Nithya
2017-01-01
Multivariate analysis refers to statistical techniques that simultaneously look at three or more variables in relation to the subjects under investigation with the aim of identifying or clarifying the relationships between them. These techniques have been broadly classified as dependence techniques, which explore the relationship between one or more dependent variables and their independent predictors, and interdependence techniques, that make no such distinction but treat all variables equally in a search for underlying relationships. Multiple linear regression models a situation where a single numerical dependent variable is to be predicted from multiple numerical independent variables. Logistic regression is used when the outcome variable is dichotomous in nature. The log-linear technique models count type of data and can be used to analyze cross-tabulations where more than two variables are included. Analysis of covariance is an extension of analysis of variance (ANOVA), in which an additional independent variable of interest, the covariate, is brought into the analysis. It tries to examine whether a difference persists after "controlling" for the effect of the covariate that can impact the numerical dependent variable of interest. Multivariate analysis of variance (MANOVA) is a multivariate extension of ANOVA used when multiple numerical dependent variables have to be incorporated in the analysis. Interdependence techniques are more commonly applied to psychometrics, social sciences and market research. Exploratory factor analysis and principal component analysis are related techniques that seek to extract from a larger number of metric variables, a smaller number of composite factors or components, which are linearly related to the original variables. Cluster analysis aims to identify, in a large number of cases, relatively homogeneous groups called clusters, without prior information about the groups. The calculation intensive nature of multivariate analysis has so far precluded most researchers from using these techniques routinely. The situation is now changing with wider availability, and increasing sophistication of statistical software and researchers should no longer shy away from exploring the applications of multivariate methods to real-life data sets.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.
Kong, Shengchun; Nan, Bin
2014-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso
Kong, Shengchun; Nan, Bin
2013-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses. PMID:24516328
Functional Relationships and Regression Analysis.
ERIC Educational Resources Information Center
Preece, Peter F. W.
1978-01-01
Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression
ERIC Educational Resources Information Center
Beckstead, Jason W.
2012-01-01
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…
Suppression Situations in Multiple Linear Regression
ERIC Educational Resources Information Center
Shieh, Gwowen
2006-01-01
This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…
Brunton, Steven L; Brunton, Bingni W; Proctor, Joshua L; Kutz, J Nathan
2016-01-01
In this wIn this work, we explore finite-dimensional linear representations of nonlinear dynamical systems by restricting the Koopman operator to an invariant subspace spanned by specially chosen observable functions. The Koopman operator is an infinite-dimensional linear operator that evolves functions of the state of a dynamical system. Dominant terms in the Koopman expansion are typically computed using dynamic mode decomposition (DMD). DMD uses linear measurements of the state variables, and it has recently been shown that this may be too restrictive for nonlinear systems. Choosing the right nonlinear observable functions to form an invariant subspace where it is possible to obtain linear reduced-order models, especially those that are useful for control, is an open challenge. Here, we investigate the choice of observable functions for Koopman analysis that enable the use of optimal linear control techniques on nonlinear problems. First, to include a cost on the state of the system, as in linear quadratic regulator (LQR) control, it is helpful to include these states in the observable subspace, as in DMD. However, we find that this is only possible when there is a single isolated fixed point, as systems with multiple fixed points or more complicated attractors are not globally topologically conjugate to a finite-dimensional linear system, and cannot be represented by a finite-dimensional linear Koopman subspace that includes the state. We then present a data-driven strategy to identify relevant observable functions for Koopman analysis by leveraging a new algorithm to determine relevant terms in a dynamical system by ℓ1-regularized regression of the data in a nonlinear function space; we also show how this algorithm is related to DMD. Finally, we demonstrate the usefulness of nonlinear observable subspaces in the design of Koopman operator optimal control laws for fully nonlinear systems using techniques from linear optimal control.ork, we explore finite-dimensional linear representations of nonlinear dynamical systems by restricting the Koopman operator to an invariant subspace spanned by specially chosen observable functions. The Koopman operator is an infinite-dimensional linear operator that evolves functions of the state of a dynamical system. Dominant terms in the Koopman expansion are typically computed using dynamic mode decomposition (DMD). DMD uses linear measurements of the state variables, and it has recently been shown that this may be too restrictive for nonlinear systems. Choosing the right nonlinear observable functions to form an invariant subspace where it is possible to obtain linear reduced-order models, especially those that are useful for control, is an open challenge. Here, we investigate the choice of observable functions for Koopman analysis that enable the use of optimal linear control techniques on nonlinear problems. First, to include a cost on the state of the system, as in linear quadratic regulator (LQR) control, it is helpful to include these states in the observable subspace, as in DMD. However, we find that this is only possible when there is a single isolated fixed point, as systems with multiple fixed points or more complicated attractors are not globally topologically conjugate to a finite-dimensional linear system, and cannot be represented by a finite-dimensional linear Koopman subspace that includes the state. We then present a data-driven strategy to identify relevant observable functions for Koopman analysis by leveraging a new algorithm to determine relevant terms in a dynamical system by ℓ1-regularized regression of the data in a nonlinear function space; we also show how this algorithm is related to DMD. Finally, we demonstrate the usefulness of nonlinear observable subspaces in the design of Koopman operator optimal control laws for fully nonlinear systems using techniques from linear optimal control.
Marrero-Ponce, Yovani; Medina-Marrero, Ricardo; Castillo-Garit, Juan A; Romero-Zaldivar, Vicente; Torrens, Francisco; Castro, Eduardo A
2005-04-15
A novel approach to bio-macromolecular design from a linear algebra point of view is introduced. A protein's total (whole protein) and local (one or more amino acid) linear indices are a new set of bio-macromolecular descriptors of relevance to protein QSAR/QSPR studies. These amino-acid level biochemical descriptors are based on the calculation of linear maps on Rn[f k(xmi):Rn-->Rn] in canonical basis. These bio-macromolecular indices are calculated from the kth power of the macromolecular pseudograph alpha-carbon atom adjacency matrix. Total linear indices are linear functional on Rn. That is, the kth total linear indices are linear maps from Rn to the scalar R[f k(xm):Rn-->R]. Thus, the kth total linear indices are calculated by summing the amino-acid linear indices of all amino acids in the protein molecule. A study of the protein stability effects for a complete set of alanine substitutions in the Arc repressor illustrates this approach. A quantitative model that discriminates near wild-type stability alanine mutants from the reduced-stability ones in a training series was obtained. This model permitted the correct classification of 97.56% (40/41) and 91.67% (11/12) of proteins in the training and test set, respectively. It shows a high Matthews correlation coefficient (MCC=0.952) for the training set and an MCC=0.837 for the external prediction set. Additionally, canonical regression analysis corroborated the statistical quality of the classification model (Rcanc=0.824). This analysis was also used to compute biological stability canonical scores for each Arc alanine mutant. On the other hand, the linear piecewise regression model compared favorably with respect to the linear regression one on predicting the melting temperature (tm) of the Arc alanine mutants. The linear model explains almost 81% of the variance of the experimental tm (R=0.90 and s=4.29) and the LOO press statistics evidenced its predictive ability (q2=0.72 and scv=4.79). Moreover, the TOMOCOMD-CAMPS method produced a linear piecewise regression (R=0.97) between protein backbone descriptors and tm values for alanine mutants of the Arc repressor. A break-point value of 51.87 degrees C characterized two mutant clusters and coincided perfectly with the experimental scale. For this reason, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutant Arc homodimers. These models also permitted the interpretation of the driving forces of such folding process, indicating that topologic/topographic protein backbone interactions control the stability profile of wild-type Arc and its alanine mutants.
Wu, Lingtao; Lord, Dominique
2017-05-01
This study further examined the use of regression models for developing crash modification factors (CMFs), specifically focusing on the misspecification in the link function. The primary objectives were to validate the accuracy of CMFs derived from the commonly used regression models (i.e., generalized linear models or GLMs with additive linear link functions) when some of the variables have nonlinear relationships and quantify the amount of bias as a function of the nonlinearity. Using the concept of artificial realistic data, various linear and nonlinear crash modification functions (CM-Functions) were assumed for three variables. Crash counts were randomly generated based on these CM-Functions. CMFs were then derived from regression models for three different scenarios. The results were compared with the assumed true values. The main findings are summarized as follows: (1) when some variables have nonlinear relationships with crash risk, the CMFs for these variables derived from the commonly used GLMs are all biased, especially around areas away from the baseline conditions (e.g., boundary areas); (2) with the increase in nonlinearity (i.e., nonlinear relationship becomes stronger), the bias becomes more significant; (3) the quality of CMFs for other variables having linear relationships can be influenced when mixed with those having nonlinear relationships, but the accuracy may still be acceptable; and (4) the misuse of the link function for one or more variables can also lead to biased estimates for other parameters. This study raised the importance of the link function when using regression models for developing CMFs. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rodríguez, A; Reyes, L F; Monclou, J; Suberviola, B; Bodí, M; Sirgo, G; Solé-Violán, J; Guardiola, J; Barahona, D; Díaz, E; Martín-Loeches, I; Restrepo, M I
2018-02-09
Serum procalcitonin (PCT) concentration could be increased in patients with renal dysfunction in the absence of bacterial infection. To determine the interactions among serum renal biomarkers of acute kidney injury (AKI) and serum PCT concentration, in patients admitted to the intensive care unit (ICU) due to lung influenza infection. Secondary analysis of a prospective multicentre observational study. 148 Spanish ICUs. ICU patients admitted with influenza infection without bacterial co-infection. Clinical, laboratory and hemodynamic variables were recorded. AKI was classified as AKI I or II based on creatinine (Cr) concentrations (≥1.60-2.50mg/dL and Cr≥2.51-3.99mg/dL, respectively). Patients with chronic renal disease, receiving renal replacement treatment or with Cr>4mg/dL were excluded. Spearman's correlation, simple and multiple linear regression analysis were performed. None. Out of 663 patients included in the study, 52 (8.2%) and 10 (1.6%) developed AKI I and II, respectively. Patients with AKI were significantly older, had more comorbid conditions and were more severally ill. PCT concentrations were higher in patients with AKI (2.62 [0.60-10.0]ng/mL vs. 0.40 [0.13-1.20]ng/mL, p=0.002). Weak correlations between Cr/PCT (rho=0.18) and Urea (U)/PCT (rho=0.19) were identified. Simple linear regression showed poor interaction between Cr/U and PCT concentrations (Cr R 2 =0.03 and U R 2 =0.018). Similar results were observed during multiple linear regression analysis (Cr R 2 =0.046 and U R 2 =0.013). Although PCT concentrations were slightly higher in patients with AKI, high PCT concentrations are not explained by AKI and could be warning sign of a potential bacterial infection. Copyright © 2018 Elsevier España, S.L.U. y SEMICYUC. All rights reserved.
Cronin, Matthew A.; Amstrup, Steven C.; Durner, George M.; Noel, Lynn E.; McDonald, Trent L.; Ballard, Warren B.
1998-01-01
There is concern that caribou (Rangifer tarandus) may avoid roads and facilities (i.e., infrastructure) in the Prudhoe Bay oil field (PBOF) in northern Alaska, and that this avoidance can have negative effects on the animals. We quantified the relationship between caribou distribution and PBOF infrastructure during the post-calving period (mid-June to mid-August) with aerial surveys from 1990 to 1995. We conducted four to eight surveys per year with complete coverage of the PBOF. We identified active oil field infrastructure and used a geographic information system (GIS) to construct ten 1 km wide concentric intervals surrounding the infrastructure. We tested whether caribou distribution is related to distance from infrastructure with a chi-squared habitat utilization-availability analysis and log-linear regression. We considered bulls, calves, and total caribou of all sex/age classes separately. The habitat utilization-availability analysis indicated there was no consistent trend of attraction to or avoidance of infrastructure. Caribou frequently were more abundant than expected in the intervals close to infrastructure, and this trend was more pronounced for bulls and for total caribou of all sex/age classes than for calves. Log-linear regression (with Poisson error structure) of numbers of caribou and distance from infrastructure were also done, with and without combining data into the 1 km distance intervals. The analysis without intervals revealed no relationship between caribou distribution and distance from oil field infrastructure, or between caribou distribution and Julian date, year, or distance from the Beaufort Sea coast. The log-linear regression with caribou combined into distance intervals showed the density of bulls and total caribou of all sex/age classes declined with distance from infrastructure. Our results indicate that during the post-calving period: 1) caribou distribution is largely unrelated to distance from infrastructure; 2) caribou regularly use habitats in the PBOF; 3) caribou often occur close to infrastructure; and 4) caribou do not appear to avoid oil field infrastructure.
NASA Astrophysics Data System (ADS)
Shan, X.; Zhang, K.; Zhuang, Y.; Fu, R.; Hong, Y.
2017-12-01
Seasonal prediction of rainfall during the dry-to-wet transition season in austral spring (September-November) over southern Amazonia is central for improving planting crops and fire mitigation in that region. Previous studies have identified the key large-scale atmospheric dynamic and thermodynamics pre-conditions during the dry season (June-August) that influence the rainfall anomalies during the dry to wet transition season over Southern Amazonia. Based on these key pre-conditions during dry season, we have evaluated several statistical models and developed a Neural Network based statistical prediction system to predict rainfall during the dry to wet transition for Southern Amazonia (5-15°S, 50-70°W). Multivariate Empirical Orthogonal Function (EOF) Analysis is applied to the following four fields during JJA from the ECMWF Reanalysis (ERA-Interim) spanning from year 1979 to 2015: geopotential height at 200 hPa, surface relative humidity, convective inhibition energy (CIN) index and convective available potential energy (CAPE), to filter out noise and highlight the most coherent spatial and temporal variations. The first 10 EOF modes are retained for inputs to the statistical models, accounting for at least 70% of the total variance in the predictor fields. We have tested several linear and non-linear statistical methods. While the regularized Ridge Regression and Lasso Regression can generally capture the spatial pattern and magnitude of rainfall anomalies, we found that that Neural Network performs best with an accuracy greater than 80%, as expected from the non-linear dependence of the rainfall on the large-scale atmospheric thermodynamic conditions and circulation. Further tests of various prediction skill metrics and hindcasts also suggest this Neural Network prediction approach can significantly improve seasonal prediction skill than the dynamic predictions and regression based statistical predictions. Thus, this statistical prediction system could have shown potential to improve real-time seasonal rainfall predictions in the future.
Cardiac surgery productivity and throughput improvements.
Lehtonen, Juha-Matti; Kujala, Jaakko; Kouri, Juhani; Hippeläinen, Mikko
2007-01-01
The high variability in cardiac surgery length--is one of the main challenges for staff managing productivity. This study aims to evaluate the impact of six interventions on open-heart surgery operating theatre productivity. A discrete operating theatre event simulation model with empirical operation time input data from 2603 patients is used to evaluate the effect that these process interventions have on the surgery output and overtime work. A linear regression model was used to get operation time forecasts for surgery scheduling while it also could be used to explain operation time. A forecasting model based on the linear regression of variables available before the surgery explains 46 per cent operating time variance. The main factors influencing operation length were type of operation, redoing the operation and the head surgeon. Reduction of changeover time between surgeries by inducing anaesthesia outside an operating theatre and by reducing slack time at the end of day after a second surgery have the strongest effects on surgery output and productivity. A more accurate operation time forecast did not have any effect on output, although improved operation time forecast did decrease overtime work. A reduction in the operation time itself is not studied in this article. However, the forecasting model can also be applied to discover which factors are most significant in explaining variation in the length of open-heart surgery. The challenge in scheduling two open-heart surgeries in one day can be partly resolved by increasing the length of the day, decreasing the time between two surgeries or by improving patient scheduling procedures so that two short surgeries can be paired. A linear regression model is created in the paper to increase the accuracy of operation time forecasting and to identify factors that have the most influence on operation time. A simulation model is used to analyse the impact of improved surgical length forecasting and five selected process interventions on productivity in cardiac surgery.
Explaining Match Outcome During The Men’s Basketball Tournament at The Olympic Games
Leicht, Anthony S.; Gómez, Miguel A.; Woods, Carl T.
2017-01-01
In preparation for the Olympics, there is a limited opportunity for coaches and athletes to interact regularly with team performance indicators providing important guidance to coaches for enhanced match success at the elite level. This study examined the relationship between match outcome and team performance indicators during men’s basketball tournaments at the Olympic Games. Twelve team performance indicators were collated from all men’s teams and matches during the basketball tournament of the 2004-2016 Olympic Games (n = 156). Linear and non-linear analyses examined the relationship between match outcome and team performance indicator characteristics; namely, binary logistic regression and a conditional interference (CI) classification tree. The most parsimonious logistic regression model retained ‘assists’, ‘defensive rebounds’, ‘field-goal percentage’, ‘fouls’, ‘fouls against’, ‘steals’ and ‘turnovers’ (delta AIC <0.01; Akaike weight = 0.28) with a classification accuracy of 85.5%. Conversely, four performance indicators were retained with the CI classification tree with an average classification accuracy of 81.4%. However, it was the combination of ‘field-goal percentage’ and ‘defensive rebounds’ that provided the greatest probability of winning (93.2%). Match outcome during the men’s basketball tournaments at the Olympic Games was identified by a unique combination of performance indicators. Despite the average model accuracy being marginally higher for the logistic regression analysis, the CI classification tree offered a greater practical utility for coaches through its resolution of non-linear phenomena to guide team success. Key points A unique combination of team performance indicators explained 93.2% of winning observations in men’s basketball at the Olympics. Monitoring of these team performance indicators may provide coaches with the capability to devise multiple game plans or strategies to enhance their likelihood of winning. Incorporation of machine learning techniques with team performance indicators may provide a valuable and strategic approach to explain patterns within multivariate datasets in sport science. PMID:29238245
DeNino, Walter F; Osler, Turner; Evans, Ellen G; Forgione, Patrick M
2010-01-01
Despite the 2008 "American Association of Clinical Endocrinologists, The Obesity Society, and American Society for Metabolic and Bariatric Surgery Medical Guidelines for Clinical Practice for the Perioperative Nutritional, Metabolic, and Nonsurgical Support of the Bariatric Surgery Patient," consensus does not exist for postoperative care in laparoscopic adjustable gastric banding (LAGB) patients (grade D evidence). It has been suggested that regular follow-up is related to better outcomes, specifically greater weight loss. The aim of the present study was to investigate the effects of travel distance to the clinic on the adherence to follow-up visits and weight loss in a cohort of LAGB patients in the setting of a rural, university-affiliated teaching hospital in the United States. A retrospective chart review was performed of all consecutive LAGB patients for a 1-year period. Linear regression analysis was used to identify the relationships between appointment compliance and the distance traveled and between the amount of weight loss and the distance traveled. Linear regression analysis was performed to investigate the effect of the travel distance to the clinic on the percentage of follow-up visits postoperatively. This effect was not significant (P = .4). Linear regression analysis was also performed to elucidate the effect of the travel distance to the clinic on the amount of weight loss. This effect was significant (P = .04). The travel distance to the clinic did not seem to be a significant predictor of compliance in a cohort of LAGB patients with ≤ 1 year of follow-up in a rural setting. However, a weak relationship was found between the travel distance to the clinic and weight loss, with patients who traveled further seeming to lose slightly more weight. Copyright © 2010 American Society for Metabolic and Bariatric Surgery. Published by Elsevier Inc. All rights reserved.
Fogelholm, M; Kanerva, N; Männistö, S
2015-09-01
High consumption of meat has been linked with the risk for obesity and chronic diseases. This could partly be explained by the association between meat and lower-quality diet. We studied whether high intake of red and processed meat was associated with lower-quality dietary habits, assessed against selected nutrients, other food groups and total diet. Moreover, we studied whether meat consumption was associated with obesity, after adjustment for all identified associations between meat and food consumption. The nationally representative cross-sectional study population consisted of 2190 Finnish men and 2530 women, aged 25-74 years. Food consumption over the previous 12 months was assessed using a validated 131-item Food Frequency Questionnaire. Associations between nutrients, foods, a modified Baltic Sea Diet Score and meat consumption (quintile classification) were analysed using linear regression. The models were adjusted for age and energy intake and additionally for education, physical activity and smoking. High consumption of red and processed meat was inversely associated with fruits, whole grain and nuts, and positively with potatoes, oil and coffee in both sexes. Results separately for the two types of meat were essentially similar. In a linear regression analysis, high consumption of meat was positively associated with body mass index in both men and women, even when using a model adjusted for all foods with a significant association with meat consumption in both sexes identified in this study. The association between meat consumption and a lower-quality diet may complicate studies on meat and health.
Weinberger, Sarah; Klarholz-Pevere, Carola; Liefeldt, Lutz; Baeder, Michael; Steckhan, Nico; Friedersdorff, Frank
2018-03-22
To analyse the influence of CT-based depth correction in the assessment of split renal function in potential living kidney donors. In 116 consecutive living kidney donors preoperative split renal function was assessed using the CT-based depth correction. Influence on donor side selection and postoperative renal function of the living kidney donors were analyzed. Linear regression analysis was performed to identify predictors of postoperative renal function. A left versus right kidney depth variation of more than 1 cm was found in 40/114 donors (35%). 11 patients (10%) had a difference of more than 5% in relative renal function after depth correction. Kidney depth variation and changes in relative renal function after depth correction would have had influence on side selection in 30 of 114 living kidney donors. CT depth correction did not improve the predictability of postoperative renal function of the living kidney donor. In general, it was not possible to predict the postoperative renal function from preoperative total and relative renal function. In multivariate linear regression analysis, age and BMI were identified as most important predictors for postoperative renal function of the living kidney donors. Our results clearly indicate that concerning the postoperative renal function of living kidney donors, the relative renal function of the donated kidney seems to be less important than other factors. A multimodal assessment with consideration of all available results including kidney size, location of the kidney and split renal function remains necessary.
Singing voice handicap and videostrobolaryngoscopy in healthy professional singers.
Castelblanco, Liliana; Habib, Michael; Stein, Daniel J; de Quadros, André; Cohen, Seth M; Noordzij, Jacob Pieter
2014-09-01
This study correlates the Singing Voice Handicap Index (SVHI) scores with videostrobolaryngoscopy in healthy professional singers as a measure of self-perceived vocal health versus actual pathology seen on examination. The objective was to measure the strength of self-assessment among professional singers and determine if there is a benefit of combining SVHI and videostrobolaryngoscopy for routine assessment of singers without an obvious singing voice problem. Prospective cross-sectional study. Forty-seven singers were included in the study. Singers produced spoken and sung pitches during videostrobolaryngoscopy. Examinations were blindly rated by two independent fellowship-trained laryngologists who assessed vocal fold appearance and function. The correlation between SVHI scores and total pathologic findings seen on videostrobolaryngoscopy was analyzed using linear regression and serial t tests. SVHI scores (mean of 22.45/144) were as expected for healthy singers. However, although all singers self-identified as healthy, laryngeal abnormalities were relatively common. The interrater reliability of total pathologic findings between two laryngologists was 71% (P = 0.006). Linear regression found no significant correlation (P = 0.9602) between SVHI scores and videostrobolaryngoscopy findings. Greater than expected laryngeal pathology was seen in these professional singers, who identified themselves as healthy, which possibly indicates a minimal impact on their singing voice and/or perception of vocal health. These findings demonstrate that laryngeal appearance alone does not dictate nor fully explain the sound or apparent health of a professional singer. Sustaining good vocal health is complex, and even experienced singers may not reliably assess the presence of pathology. Copyright © 2014 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
Regression Commonality Analysis: A Technique for Quantitative Theory Building
ERIC Educational Resources Information Center
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
ERIC Educational Resources Information Center
Jurs, Stephen; And Others
The scree test and its linear regression technique are reviewed, and results of its use in factor analysis and Delphi data sets are described. The scree test was originally a visual approach for making judgments about eigenvalues, which considered the relationships of the eigenvalues to one another as well as their actual values. The graph that is…
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method.
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2015-11-18
Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2016-01-01
Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889
Some comparisons of complexity in dictionary-based and linear computational models.
Gnecco, Giorgio; Kůrková, Věra; Sanguineti, Marcello
2011-03-01
Neural networks provide a more flexible approximation of functions than traditional linear regression. In the latter, one can only adjust the coefficients in linear combinations of fixed sets of functions, such as orthogonal polynomials or Hermite functions, while for neural networks, one may also adjust the parameters of the functions which are being combined. However, some useful properties of linear approximators (such as uniqueness, homogeneity, and continuity of best approximation operators) are not satisfied by neural networks. Moreover, optimization of parameters in neural networks becomes more difficult than in linear regression. Experimental results suggest that these drawbacks of neural networks are offset by substantially lower model complexity, allowing accuracy of approximation even in high-dimensional cases. We give some theoretical results comparing requirements on model complexity for two types of approximators, the traditional linear ones and so called variable-basis types, which include neural networks, radial, and kernel models. We compare upper bounds on worst-case errors in variable-basis approximation with lower bounds on such errors for any linear approximator. Using methods from nonlinear approximation and integral representations tailored to computational units, we describe some cases where neural networks outperform any linear approximator. Copyright © 2010 Elsevier Ltd. All rights reserved.
Montoye, Alexander H K; Begum, Munni; Henning, Zachary; Pfeiffer, Karin A
2017-02-01
This study had three purposes, all related to evaluating energy expenditure (EE) prediction accuracy from body-worn accelerometers: (1) compare linear regression to linear mixed models, (2) compare linear models to artificial neural network models, and (3) compare accuracy of accelerometers placed on the hip, thigh, and wrists. Forty individuals performed 13 activities in a 90 min semi-structured, laboratory-based protocol. Participants wore accelerometers on the right hip, right thigh, and both wrists and a portable metabolic analyzer (EE criterion). Four EE prediction models were developed for each accelerometer: linear regression, linear mixed, and two ANN models. EE prediction accuracy was assessed using correlations, root mean square error (RMSE), and bias and was compared across models and accelerometers using repeated-measures analysis of variance. For all accelerometer placements, there were no significant differences for correlations or RMSE between linear regression and linear mixed models (correlations: r = 0.71-0.88, RMSE: 1.11-1.61 METs; p > 0.05). For the thigh-worn accelerometer, there were no differences in correlations or RMSE between linear and ANN models (ANN-correlations: r = 0.89, RMSE: 1.07-1.08 METs. Linear models-correlations: r = 0.88, RMSE: 1.10-1.11 METs; p > 0.05). Conversely, one ANN had higher correlations and lower RMSE than both linear models for the hip (ANN-correlation: r = 0.88, RMSE: 1.12 METs. Linear models-correlations: r = 0.86, RMSE: 1.18-1.19 METs; p < 0.05), and both ANNs had higher correlations and lower RMSE than both linear models for the wrist-worn accelerometers (ANN-correlations: r = 0.82-0.84, RMSE: 1.26-1.32 METs. Linear models-correlations: r = 0.71-0.73, RMSE: 1.55-1.61 METs; p < 0.01). For studies using wrist-worn accelerometers, machine learning models offer a significant improvement in EE prediction accuracy over linear models. Conversely, linear models showed similar EE prediction accuracy to machine learning models for hip- and thigh-worn accelerometers and may be viable alternative modeling techniques for EE prediction for hip- or thigh-worn accelerometers.
Relationship between Gender Roles and Sexual Assertiveness in Married Women.
Azmoude, Elham; Firoozi, Mahbobe; Sadeghi Sahebzad, Elahe; Asgharipour, Neghar
2016-10-01
Evidence indicates that sexual assertiveness is one of the important factors affecting sexual satisfaction. According to some studies, traditional gender norms conflict with women's capability in expressing sexual desires. This study examined the relationship between gender roles and sexual assertiveness in married women in Mashhad, Iran. This cross-sectional study was conducted on 120 women who referred to Mashhad health centers through convenient sampling in 2014-15. Data were collected using Bem Sex Role Inventory (BSRI) and Hulbert index of sexual assertiveness. Data were analyzed using SPSS 16 by Pearson and Spearman's correlation tests and linear Regression Analysis. The mean scores of sexual assertiveness was 54.93±13.20. According to the findings, there was non-significant correlation between Femininity and masculinity score with sexual assertiveness (P=0.069 and P=0.080 respectively). Linear regression analysis indicated that among the predictor variables, only Sexual function satisfaction was identified as the sexual assertiveness summary predictor variables (P=0.001). Based on the results, sexual assertiveness in married women does not comply with gender role, but it is related to Sexual function satisfaction. So, counseling psychologists need to consider this variable when designing intervention programs for modifying sexual assertiveness and find other variables that affect sexual assertiveness.
Yamakado, Minoru; Tanaka, Takayuki; Nagao, Kenji; Imaizumi, Akira; Komatsu, Michiharu; Daimon, Takashi; Miyano, Hiroshi; Tani, Mizuki; Toda, Akiko; Yamamoto, Hiroshi; Horimoto, Katsuhisa; Ishizaka, Yuko
2017-11-03
Fatty liver disease (FLD) increases the risk of diabetes, cardiovascular disease, and steatohepatitis, which leads to fibrosis, cirrhosis, and hepatocellular carcinoma. Thus, the early detection of FLD is necessary. We aimed to find a quantitative and feasible model for discriminating the FLD, based on plasma free amino acid (PFAA) profiles. We constructed models of the relationship between PFAA levels in 2,000 generally healthy Japanese subjects and the diagnosis of FLD by abdominal ultrasound scan by multiple logistic regression analysis with variable selection. The performance of these models for FLD discrimination was validated using an independent data set of 2,160 subjects. The generated PFAA-based model was able to identify FLD patients. The area under the receiver operating characteristic curve for the model was 0.83, which was higher than those of other existing liver function-associated markers ranging from 0.53 to 0.80. The value of the linear discriminant in the model yielded the adjusted odds ratio (with 95% confidence intervals) for a 1 standard deviation increase of 2.63 (2.14-3.25) in the multiple logistic regression analysis with known liver function-associated covariates. Interestingly, the linear discriminant values were significantly associated with the progression of FLD, and patients with nonalcoholic steatohepatitis also exhibited higher values.
Li, Siyue; Zhang, Quanfa
2011-06-15
Water samples were collected for determination of dissolved trace metals in 56 sampling sites throughout the upper Han River, China. Multivariate statistical analyses including correlation analysis, stepwise multiple linear regression models, and principal component and factor analysis (PCA/FA) were employed to examine the land use influences on trace metals, and a receptor model of factor analysis-multiple linear regression (FA-MLR) was used for source identification/apportionment of anthropogenic heavy metals in the surface water of the River. Our results revealed that land use was an important factor in water metals in the snow melt flow period and land use in the riparian zone was not a better predictor of metals than land use away from the river. Urbanization in a watershed and vegetation along river networks could better explain metals, and agriculture, regardless of its relative location, however slightly explained metal variables in the upper Han River. FA-MLR analysis identified five source types of metals, and mining, fossil fuel combustion, and vehicle exhaust were the dominant pollutions in the surface waters. The results demonstrated great impacts of human activities on metal concentrations in the subtropical river of China. Copyright © 2011 Elsevier B.V. All rights reserved.
Demographic and clinical features related to perceived discrimination in schizophrenia.
Fresán, Ana; Robles-García, Rebeca; Madrigal, Eduardo; Tovilla-Zarate, Carlos-Alfonso; Martínez-López, Nicolás; Arango de Montis, Iván
2018-04-01
Perceived discrimination contributes to the development of internalized stigma among those with schizophrenia. Evidence on demographic and clinical factors related to the perception of discrimination among this population is both contradictory and scarce in low- and middle-income countries. Accordingly, the main purpose of this study is to determine the demographic and clinical factors predicting the perception of discrimination among Mexican patients with schizophrenia. Two hundred and seventeen adults with paranoid schizophrenia completed an interview on their demographic status and clinical characteristics. Symptom severity was assessed using the Positive and Negative Syndrome Scale; and perceived discrimination using 13 items from the King's Internalized Stigma Scale. Bivariate linear associations were determined to identify the variables of interest to be included in a linear regression analysis. Years of education, age of illness onset and length of hospitalization were associated with discrimination. However, only age of illness onset and length of hospitalization emerged as predictors of perceived discrimination in the final regression analysis, with longer length of hospitalization being the independent variable with the greatest contribution. Fortunately, this is a modifiable factor regarding the perception of discrimination and self-stigma. Strategies for achieving this as part of community-based mental health care are also discussed. Copyright © 2017 Elsevier B.V. All rights reserved.
Relationship between Gender Roles and Sexual Assertiveness in Married Women
Azmoude, Elham; Firoozi, Mahbobe; Sadeghi Sahebzad, Elahe; Asgharipour, Neghar
2016-01-01
ABSTRACT Background: Evidence indicates that sexual assertiveness is one of the important factors affecting sexual satisfaction. According to some studies, traditional gender norms conflict with women’s capability in expressing sexual desires. This study examined the relationship between gender roles and sexual assertiveness in married women in Mashhad, Iran. Methods: This cross-sectional study was conducted on 120 women who referred to Mashhad health centers through convenient sampling in 2014-15. Data were collected using Bem Sex Role Inventory (BSRI) and Hulbert index of sexual assertiveness. Data were analyzed using SPSS 16 by Pearson and Spearman’s correlation tests and linear Regression Analysis. Results: The mean scores of sexual assertiveness was 54.93±13.20. According to the findings, there was non-significant correlation between Femininity and masculinity score with sexual assertiveness (P=0.069 and P=0.080 respectively). Linear regression analysis indicated that among the predictor variables, only Sexual function satisfaction was identified as the sexual assertiveness summary predictor variables (P=0.001). Conclusion: Based on the results, sexual assertiveness in married women does not comply with gender role, but it is related to Sexual function satisfaction. So, counseling psychologists need to consider this variable when designing intervention programs for modifying sexual assertiveness and find other variables that affect sexual assertiveness. PMID:27713899
Cer, Regina Z; Herrera-Galeano, J Enrique; Anderson, Joseph J; Bishop-Lilly, Kimberly A; Mokashi, Vishwesh P
2014-01-01
Understanding the biological roles of microRNAs (miRNAs) is a an active area of research that has produced a surge of publications in PubMed, particularly in cancer research. Along with this increasing interest, many open-source bioinformatics tools to identify existing and/or discover novel miRNAs in next-generation sequencing (NGS) reads become available. While miRNA identification and discovery tools are significantly improved, the development of miRNA differential expression analysis tools, especially in temporal studies, remains substantially challenging. Further, the installation of currently available software is non-trivial and steps of testing with example datasets, trying with one's own dataset, and interpreting the results require notable expertise and time. Subsequently, there is a strong need for a tool that allows scientists to normalize raw data, perform statistical analyses, and provide intuitive results without having to invest significant efforts. We have developed miRNA Temporal Analyzer (mirnaTA), a bioinformatics package to identify differentially expressed miRNAs in temporal studies. mirnaTA is written in Perl and R (Version 2.13.0 or later) and can be run across multiple platforms, such as Linux, Mac and Windows. In the current version, mirnaTA requires users to provide a simple, tab-delimited, matrix file containing miRNA name and count data from a minimum of two to a maximum of 20 time points and three replicates. To recalibrate data and remove technical variability, raw data is normalized using Normal Quantile Transformation (NQT), and linear regression model is used to locate any miRNAs which are differentially expressed in a linear pattern. Subsequently, remaining miRNAs which do not fit a linear model are further analyzed in two different non-linear methods 1) cumulative distribution function (CDF) or 2) analysis of variances (ANOVA). After both linear and non-linear analyses are completed, statistically significant miRNAs (P < 0.05) are plotted as heat maps using hierarchical cluster analysis and Euclidean distance matrix computation methods. mirnaTA is an open-source, bioinformatics tool to aid scientists in identifying differentially expressed miRNAs which could be further mined for biological significance. It is expected to provide researchers with a means of interpreting raw data to statistical summaries in a fast and intuitive manner.
Diagnosis of Enzyme Inhibition Using Excel Solver: A Combined Dry and Wet Laboratory Exercise
ERIC Educational Resources Information Center
Dias, Albino A.; Pinto, Paula A.; Fraga, Irene; Bezerra, Rui M. F.
2014-01-01
In enzyme kinetic studies, linear transformations of the Michaelis-Menten equation, such as the Lineweaver-Burk double-reciprocal transformation, present some constraints. The linear transformation distorts the experimental error and the relationship between "x" and "y" axes; consequently, linear regression of transformed data…
Poor methodological quality and reporting standards of systematic reviews in burn care management.
Wasiak, Jason; Tyack, Zephanie; Ware, Robert; Goodwin, Nicholas; Faggion, Clovis M
2017-10-01
The methodological and reporting quality of burn-specific systematic reviews has not been established. The aim of this study was to evaluate the methodological quality of systematic reviews in burn care management. Computerised searches were performed in Ovid MEDLINE, Ovid EMBASE and The Cochrane Library through to February 2016 for systematic reviews relevant to burn care using medical subject and free-text terms such as 'burn', 'systematic review' or 'meta-analysis'. Additional studies were identified by hand-searching five discipline-specific journals. Two authors independently screened papers, extracted and evaluated methodological quality using the 11-item A Measurement Tool to Assess Systematic Reviews (AMSTAR) tool and reporting quality using the 27-item Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist. Characteristics of systematic reviews associated with methodological and reporting quality were identified. Descriptive statistics and linear regression identified features associated with improved methodological quality. A total of 60 systematic reviews met the inclusion criteria. Six of the 11 AMSTAR items reporting on 'a priori' design, duplicate study selection, grey literature, included/excluded studies, publication bias and conflict of interest were reported in less than 50% of the systematic reviews. Of the 27 items listed for PRISMA, 13 items reporting on introduction, methods, results and the discussion were addressed in less than 50% of systematic reviews. Multivariable analyses showed that systematic reviews associated with higher methodological or reporting quality incorporated a meta-analysis (AMSTAR regression coefficient 2.1; 95% CI: 1.1, 3.1; PRISMA regression coefficient 6·3; 95% CI: 3·8, 8·7) were published in the Cochrane library (AMSTAR regression coefficient 2·9; 95% CI: 1·6, 4·2; PRISMA regression coefficient 6·1; 95% CI: 3·1, 9·2) and included a randomised control trial (AMSTAR regression coefficient 1·4; 95%CI: 0·4, 2·4; PRISMA regression coefficient 3·4; 95% CI: 0·9, 5·8). The methodological and reporting quality of systematic reviews in burn care requires further improvement with stricter adherence by authors to the PRISMA checklist and AMSTAR tool. © 2016 Medicalhelplines.com Inc and John Wiley & Sons Ltd.
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.