Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H
2017-05-10
We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P value
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.
Linear regression analysis: part 14 of a series on evaluation of scientific publications.
Schneider, Astrid; Hommel, Gerhard; Blettner, Maria
2010-11-01
Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.
Quality of life in breast cancer patients--a quantile regression analysis.
Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma
2008-01-01
Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
An improved multiple linear regression and data analysis computer program package
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Common pitfalls in statistical analysis: Linear regression analysis
Aggarwal, Rakesh; Ranganathan, Priya
2017-01-01
In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis. PMID:28447022
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.
Ferrari, Alberto; Comelli, Mario
2016-12-01
In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F
2018-06-01
This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.
Linear regression analysis of survival data with missing censoring indicators.
Wang, Qihua; Dinse, Gregg E
2011-04-01
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Radio Propagation Prediction Software for Complex Mixed Path Physical Channels
2006-08-14
63 4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz 69 4.4.7. Projected Scaling to...4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz In order to construct a comprehensive numerical algorithm capable of
Applied Multiple Linear Regression: A General Research Strategy
ERIC Educational Resources Information Center
Smith, Brandon B.
1969-01-01
Illustrates some of the basic concepts and procedures for using regression analysis in experimental design, analysis of variance, analysis of covariance, and curvilinear regression. Applications to evaluation of instruction and vocational education programs are illustrated. (GR)
A simplified competition data analysis for radioligand specific activity determination.
Venturino, A; Rivera, E S; Bergoc, R M; Caro, R A
1990-01-01
Non-linear regression and two-step linear fit methods were developed to determine the actual specific activity of 125I-ovine prolactin by radioreceptor self-displacement analysis. The experimental results obtained by the different methods are superposable. The non-linear regression method is considered to be the most adequate procedure to calculate the specific activity, but if its software is not available, the other described methods are also suitable.
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Conjoint Analysis: A Study of the Effects of Using Person Variables.
ERIC Educational Resources Information Center
Fraas, John W.; Newman, Isadore
Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…
An Analysis of COLA (Cost of Living Adjustment) Allocation within the United States Coast Guard.
1983-09-01
books Applied Linear Regression [Ref. 39], and Statistical Methods in Research and Production [Ref. 40], or any other book on regression. In the event...Indexes, Master’s Thesis, Air Force Institute of Technology, Wright-Patterson AFB, 1976. 39. Weisberg, Stanford, Applied Linear Regression , Wiley, 1980. 40
Teaching the Concept of Breakdown Point in Simple Linear Regression.
ERIC Educational Resources Information Center
Chan, Wai-Sum
2001-01-01
Most introductory textbooks on simple linear regression analysis mention the fact that extreme data points have a great influence on ordinary least-squares regression estimation; however, not many textbooks provide a rigorous mathematical explanation of this phenomenon. Suggests a way to fill this gap by teaching students the concept of breakdown…
Local Linear Regression for Data with AR Errors.
Li, Runze; Li, Yan
2009-07-01
In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
ERIC Educational Resources Information Center
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
ERIC Educational Resources Information Center
Rocconi, Louis M.
2011-01-01
Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…
Bennett, Bradley C; Husby, Chad E
2008-03-28
Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
Maintenance Operations in Mission Oriented Protective Posture Level IV (MOPPIV)
1987-10-01
Repair FADAC Printed Circuit Board ............. 6 3. Data Analysis Techniques ............................. 6 a. Multiple Linear Regression... ANALYSIS /DISCUSSION ............................... 12 1. Exa-ple of Regression Analysis ..................... 12 S2. Regression results for all tasks...6 * TABLE 9. Task Grouping for Analysis ........................ 7 "TABXLE 10. Remove/Replace H60A3 Power Pack................. 8 TABLE
Biostatistics Series Module 6: Correlation and Linear Regression.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Biostatistics Series Module 6: Correlation and Linear Regression
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous. PMID:27904175
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Kovačević, Strahinja; Karadžić, Milica; Podunavac-Kuzmanović, Sanja; Jevrić, Lidija
2018-01-01
The present study is based on the quantitative structure-activity relationship (QSAR) analysis of binding affinity toward human prion protein (huPrP C ) of quinacrine, pyridine dicarbonitrile, diphenylthiazole and diphenyloxazole analogs applying different linear and non-linear chemometric regression techniques, including univariate linear regression, multiple linear regression, partial least squares regression and artificial neural networks. The QSAR analysis distinguished molecular lipophilicity as an important factor that contributes to the binding affinity. Principal component analysis was used in order to reveal similarities or dissimilarities among the studied compounds. The analysis of in silico absorption, distribution, metabolism, excretion and toxicity (ADMET) parameters was conducted. The ranking of the studied analogs on the basis of their ADMET parameters was done applying the sum of ranking differences, as a relatively new chemometric method. The main aim of the study was to reveal the most important molecular features whose changes lead to the changes in the binding affinities of the studied compounds. Another point of view on the binding affinity of the most promising analogs was established by application of molecular docking analysis. The results of the molecular docking were proven to be in agreement with the experimental outcome. Copyright © 2017 Elsevier B.V. All rights reserved.
Quantile Regression in the Study of Developmental Sciences
Petscher, Yaacov; Logan, Jessica A. R.
2014-01-01
Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstrated and contrasted with linear regression when considering models with: (a) one continuous predictor, (b) one dichotomous predictor, (c) a continuous and a dichotomous predictor, and (d) a longitudinal application. Results from each example exhibited the differential inferences which may be drawn using linear or quantile regression. PMID:24329596
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.
Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M
2017-04-01
A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.
Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S
2017-06-01
The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert
2012-01-01
Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Prediction of the Main Engine Power of a New Container Ship at the Preliminary Design Stage
NASA Astrophysics Data System (ADS)
Cepowski, Tomasz
2017-06-01
The paper presents mathematical relationships that allow us to forecast the estimated main engine power of new container ships, based on data concerning vessels built in 2005-2015. The presented approximations allow us to estimate the engine power based on the length between perpendiculars and the number of containers the ship will carry. The approximations were developed using simple linear regression and multivariate linear regression analysis. The presented relations have practical application for estimation of container ship engine power needed in preliminary parametric design of the ship. It follows from the above that the use of multiple linear regression to predict the main engine power of a container ship brings more accurate solutions than simple linear regression.
Kwan, Johnny S H; Kung, Annie W C; Sham, Pak C
2011-09-01
Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.
Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma
2016-01-01
Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens. PMID:27651666
Study on power grid characteristics in summer based on Linear regression analysis
NASA Astrophysics Data System (ADS)
Tang, Jin-hui; Liu, You-fei; Liu, Juan; Liu, Qiang; Liu, Zhuan; Xu, Xi
2018-05-01
The correlation analysis of power load and temperature is the precondition and foundation for accurate load prediction, and a great deal of research has been made. This paper constructed the linear correlation model between temperature and power load, then the correlation of fault maintenance work orders with the power load is researched. Data details of Jiangxi province in 2017 summer such as temperature, power load, fault maintenance work orders were adopted in this paper to develop data analysis and mining. Linear regression models established in this paper will promote electricity load growth forecast, fault repair work order review, distribution network operation weakness analysis and other work to further deepen the refinement.
Functional Relationships and Regression Analysis.
ERIC Educational Resources Information Center
Preece, Peter F. W.
1978-01-01
Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression
ERIC Educational Resources Information Center
Beckstead, Jason W.
2012-01-01
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
2016-05-01
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
ERIC Educational Resources Information Center
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Regression Commonality Analysis: A Technique for Quantitative Theory Building
ERIC Educational Resources Information Center
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Applications of statistics to medical science, III. Correlation and regression.
Watanabe, Hiroshi
2012-01-01
In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seong W. Lee
During this reporting period, the literature survey including the gasifier temperature measurement literature, the ultrasonic application and its background study in cleaning application, and spray coating process are completed. The gasifier simulator (cold model) testing has been successfully conducted. Four factors (blower voltage, ultrasonic application, injection time intervals, particle weight) were considered as significant factors that affect the temperature measurement. The Analysis of Variance (ANOVA) was applied to analyze the test data. The analysis shows that all four factors are significant to the temperature measurements in the gasifier simulator (cold model). The regression analysis for the case with the normalizedmore » room temperature shows that linear model fits the temperature data with 82% accuracy (18% error). The regression analysis for the case without the normalized room temperature shows 72.5% accuracy (27.5% error). The nonlinear regression analysis indicates a better fit than that of the linear regression. The nonlinear regression model's accuracy is 88.7% (11.3% error) for normalized room temperature case, which is better than the linear regression analysis. The hot model thermocouple sleeve design and fabrication are completed. The gasifier simulator (hot model) design and the fabrication are completed. The system tests of the gasifier simulator (hot model) have been conducted and some modifications have been made. Based on the system tests and results analysis, the gasifier simulator (hot model) has met the proposed design requirement and the ready for system test. The ultrasonic cleaning method is under evaluation and will be further studied for the gasifier simulator (hot model) application. The progress of this project has been on schedule.« less
ERIC Educational Resources Information Center
Dolan, Conor V.; Wicherts, Jelte M.; Molenaar, Peter C. M.
2004-01-01
We consider the question of how variation in the number and reliability of indicators affects the power to reject the hypothesis that the regression coefficients are zero in latent linear regression analysis. We show that power remains constant as long as the coefficient of determination remains unchanged. Any increase in the number of indicators…
ERIC Educational Resources Information Center
Jurs, Stephen; And Others
The scree test and its linear regression technique are reviewed, and results of its use in factor analysis and Delphi data sets are described. The scree test was originally a visual approach for making judgments about eigenvalues, which considered the relationships of the eigenvalues to one another as well as their actual values. The graph that is…
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure
Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith
2017-01-01
Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
Determining Predictor Importance in Hierarchical Linear Models Using Dominance Analysis
ERIC Educational Resources Information Center
Luo, Wen; Azen, Razia
2013-01-01
Dominance analysis (DA) is a method used to evaluate the relative importance of predictors that was originally proposed for linear regression models. This article proposes an extension of DA that allows researchers to determine the relative importance of predictors in hierarchical linear models (HLM). Commonly used measures of model adequacy in…
Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A
2016-08-01
The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data basis for multivariate analysis methods, equivalent to data resulting from chromatographic separations. The alternative evaluation of very large data series based on linear regression analysis produced information equivalent to results obtained through application of PCA an CA. Copyright © 2016 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Richter, Tobias
2006-01-01
Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…
Some Applied Research Concerns Using Multiple Linear Regression Analysis.
ERIC Educational Resources Information Center
Newman, Isadore; Fraas, John W.
The intention of this paper is to provide an overall reference on how a researcher can apply multiple linear regression in order to utilize the advantages that it has to offer. The advantages and some concerns expressed about the technique are examined. A number of practical ways by which researchers can deal with such concerns as…
Automating approximate Bayesian computation by local linear regression.
Thornton, Kevin R
2009-07-07
In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.
NASA Astrophysics Data System (ADS)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
Classical Testing in Functional Linear Models.
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.
Classical Testing in Functional Linear Models
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155
Advanced microwave soil moisture studies. [Big Sioux River Basin, Iowa
NASA Technical Reports Server (NTRS)
Dalsted, K. J.; Harlan, J. C.
1983-01-01
Comparisons of low level L-band brightness temperature (TB) and thermal infrared (TIR) data as well as the following data sets: soil map and land cover data; direct soil moisture measurement; and a computer generated contour map were statistically evaluated using regression analysis and linear discriminant analysis. Regression analysis of footprint data shows that statistical groupings of ground variables (soil features and land cover) hold promise for qualitative assessment of soil moisture and for reducing variance within the sampling space. Dry conditions appear to be more conductive to producing meaningful statistics than wet conditions. Regression analysis using field averaged TB and TIR data did not approach the higher sq R values obtained using within-field variations. The linear discriminant analysis indicates some capacity to distinguish categories with the results being somewhat better on a field basis than a footprint basis.
Handling nonnormality and variance heterogeneity for quantitative sublethal toxicity tests.
Ritz, Christian; Van der Vliet, Leana
2009-09-01
The advantages of using regression-based techniques to derive endpoints from environmental toxicity data are clear, and slowly, this superior analytical technique is gaining acceptance. As use of regression-based analysis becomes more widespread, some of the associated nuances and potential problems come into sharper focus. Looking at data sets that cover a broad spectrum of standard test species, we noticed that some model fits to data failed to meet two key assumptions-variance homogeneity and normality-that are necessary for correct statistical analysis via regression-based techniques. Failure to meet these assumptions often is caused by reduced variance at the concentrations showing severe adverse effects. Although commonly used with linear regression analysis, transformation of the response variable only is not appropriate when fitting data using nonlinear regression techniques. Through analysis of sample data sets, including Lemna minor, Eisenia andrei (terrestrial earthworm), and algae, we show that both the so-called Box-Cox transformation and use of the Poisson distribution can help to correct variance heterogeneity and nonnormality and so allow nonlinear regression analysis to be implemented. Both the Box-Cox transformation and the Poisson distribution can be readily implemented into existing protocols for statistical analysis. By correcting for nonnormality and variance heterogeneity, these two statistical tools can be used to encourage the transition to regression-based analysis and the depreciation of less-desirable and less-flexible analytical techniques, such as linear interpolation.
Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi
2013-09-01
Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore, examples are given as how to evaluate the linearity over the entire concentration range when only two concentration levels are used for linear regression. To conclude, two-concentration linear regression is accurate and robust enough for routine use in regulated LC-MS bioanalysis and it significantly saves time and cost as well. Copyright © 2013 Elsevier B.V. All rights reserved.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Kumar, K Vasanth; Sivanesan, S
2006-08-25
Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.
Schistosomiasis Breeding Environment Situation Analysis in Dongting Lake Area
NASA Astrophysics Data System (ADS)
Li, Chuanrong; Jia, Yuanyuan; Ma, Lingling; Liu, Zhaoyan; Qian, Yonggang
2013-01-01
Monitoring environmental characteristics, such as vegetation, soil moisture et al., of Oncomelania hupensis (O. hupensis)’ spatial/temporal distribution is of vital importance to the schistosomiasis prevention and control. In this study, the relationship between environmental factors derived from remotely sensed data and the density of O. hupensis was analyzed by a multiple linear regression model. Secondly, spatial analysis of the regression residual was investigated by the semi-variogram method. Thirdly, spatial analysis of the regression residual and the multiple linear regression model were both employed to estimate the spatial variation of O. hupensis density. Finally, the approach was used to monitor and predict the spatial and temporal variations of oncomelania of Dongting Lake region, China. And the areas of potential O. hupensis habitats were predicted and the influence of Three Gorges Dam (TGB)project on the density of O. hupensis was analyzed.
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.
2013-01-01
Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A
2013-01-01
Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Predictive and mechanistic multivariate linear regression models for reaction development
Santiago, Celine B.; Guo, Jing-Yao
2018-01-01
Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711
Laurens, L M L; Wolfrum, E J
2013-12-18
One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.
NASA Astrophysics Data System (ADS)
Haris, A.; Nafian, M.; Riyanto, A.
2017-07-01
Danish North Sea Fields consist of several formations (Ekofisk, Tor, and Cromer Knoll) that was started from the age of Paleocene to Miocene. In this study, the integration of seismic and well log data set is carried out to determine the chalk sand distribution in the Danish North Sea field. The integration of seismic and well log data set is performed by using the seismic inversion analysis and seismic multi-attribute. The seismic inversion algorithm, which is used to derive acoustic impedance (AI), is model-based technique. The derived AI is then used as external attributes for the input of multi-attribute analysis. Moreover, the multi-attribute analysis is used to generate the linear and non-linear transformation of among well log properties. In the case of the linear model, selected transformation is conducted by weighting step-wise linear regression (SWR), while for the non-linear model is performed by using probabilistic neural networks (PNN). The estimated porosity, which is resulted by PNN shows better suited to the well log data compared with the results of SWR. This result can be understood since PNN perform non-linear regression so that the relationship between the attribute data and predicted log data can be optimized. The distribution of chalk sand has been successfully identified and characterized by porosity value ranging from 23% up to 30%.
Local linear regression for function learning: an analysis based on sample discrepancy.
Cervellera, Cristiano; Macciò, Danilo
2014-11-01
Local linear regression models, a kind of nonparametric structures that locally perform a linear estimation of the target function, are analyzed in the context of empirical risk minimization (ERM) for function learning. The analysis is carried out with emphasis on geometric properties of the available data. In particular, the discrepancy of the observation points used both to build the local regression models and compute the empirical risk is considered. This allows to treat indifferently the case in which the samples come from a random external source and the one in which the input space can be freely explored. Both consistency of the ERM procedure and approximating capabilities of the estimator are analyzed, proving conditions to ensure convergence. Since the theoretical analysis shows that the estimation improves as the discrepancy of the observation points becomes smaller, low-discrepancy sequences, a family of sampling methods commonly employed for efficient numerical integration, are also analyzed. Simulation results involving two different examples of function learning are provided.
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Abunama, Taher; Othman, Faridah
2017-06-01
Analysing the fluctuations of wastewater inflow rates in sewage treatment plants (STPs) is essential to guarantee a sufficient treatment of wastewater before discharging it to the environment. The main objectives of this study are to statistically analyze and forecast the wastewater inflow rates into the Bandar Tun Razak STP in Kuala Lumpur, Malaysia. A time series analysis of three years’ weekly influent data (156weeks) has been conducted using the Auto-Regressive Integrated Moving Average (ARIMA) model. Various combinations of ARIMA orders (p, d, q) have been tried to select the most fitted model, which was utilized to forecast the wastewater inflow rates. The linear regression analysis was applied to testify the correlation between the observed and predicted influents. ARIMA (3, 1, 3) model was selected with the highest significance R-square and lowest normalized Bayesian Information Criterion (BIC) value, and accordingly the wastewater inflow rates were forecasted to additional 52weeks. The linear regression analysis between the observed and predicted values of the wastewater inflow rates showed a positive linear correlation with a coefficient of 0.831.
Dinç, Erdal; Ozdemir, Abdil
2005-01-01
Multivariate chromatographic calibration technique was developed for the quantitative analysis of binary mixtures enalapril maleate (EA) and hydrochlorothiazide (HCT) in tablets in the presence of losartan potassium (LST). The mathematical algorithm of multivariate chromatographic calibration technique is based on the use of the linear regression equations constructed using relationship between concentration and peak area at the five-wavelength set. The algorithm of this mathematical calibration model having a simple mathematical content was briefly described. This approach is a powerful mathematical tool for an optimum chromatographic multivariate calibration and elimination of fluctuations coming from instrumental and experimental conditions. This multivariate chromatographic calibration contains reduction of multivariate linear regression functions to univariate data set. The validation of model was carried out by analyzing various synthetic binary mixtures and using the standard addition technique. Developed calibration technique was applied to the analysis of the real pharmaceutical tablets containing EA and HCT. The obtained results were compared with those obtained by classical HPLC method. It was observed that the proposed multivariate chromatographic calibration gives better results than classical HPLC.
Liu, Yan; Salvendy, Gavriel
2009-05-01
This paper aims to demonstrate the effects of measurement errors on psychometric measurements in ergonomics studies. A variety of sources can cause random measurement errors in ergonomics studies and these errors can distort virtually every statistic computed and lead investigators to erroneous conclusions. The effects of measurement errors on five most widely used statistical analysis tools have been discussed and illustrated: correlation; ANOVA; linear regression; factor analysis; linear discriminant analysis. It has been shown that measurement errors can greatly attenuate correlations between variables, reduce statistical power of ANOVA, distort (overestimate, underestimate or even change the sign of) regression coefficients, underrate the explanation contributions of the most important factors in factor analysis and depreciate the significance of discriminant function and discrimination abilities of individual variables in discrimination analysis. The discussions will be restricted to subjective scales and survey methods and their reliability estimates. Other methods applied in ergonomics research, such as physical and electrophysiological measurements and chemical and biomedical analysis methods, also have issues of measurement errors, but they are beyond the scope of this paper. As there has been increasing interest in the development and testing of theories in ergonomics research, it has become very important for ergonomics researchers to understand the effects of measurement errors on their experiment results, which the authors believe is very critical to research progress in theory development and cumulative knowledge in the ergonomics field.
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.
Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard
2017-04-01
To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.
Effect of Contact Damage on the Strength of Ceramic Materials.
1982-10-01
variables that are important to erosion, and a multivariate , linear regression analysis is used to fit the data to the dimensional analysis. The...of Equations 7 and 8 by a multivariable regression analysis (room tem- perature data) Exponent Regression Standard error Computed coefficient of...1980) 593. WEAVER, Proc. Brit. Ceram. Soc. 22 (1973) 125. 39. P. W. BRIDGMAN, "Dimensional Analaysis ", (Yale 18. R. W. RICE, S. W. FREIMAN and P. F
Brown, A M
2001-06-01
The objective of this present study was to introduce a simple, easily understood method for carrying out non-linear regression analysis based on user input functions. While it is relatively straightforward to fit data with simple functions such as linear or logarithmic functions, fitting data with more complicated non-linear functions is more difficult. Commercial specialist programmes are available that will carry out this analysis, but these programmes are expensive and are not intuitive to learn. An alternative method described here is to use the SOLVER function of the ubiquitous spreadsheet programme Microsoft Excel, which employs an iterative least squares fitting routine to produce the optimal goodness of fit between data and function. The intent of this paper is to lead the reader through an easily understood step-by-step guide to implementing this method, which can be applied to any function in the form y=f(x), and is well suited to fast, reliable analysis of data in all fields of biology.
Suzuki, Taku; Iwamoto, Takuji; Shizu, Kanae; Suzuki, Katsuji; Yamada, Harumoto; Sato, Kazuki
2017-05-01
This retrospective study was designed to investigate prognostic factors for postoperative outcomes for cubital tunnel syndrome (CubTS) using multiple logistic regression analysis with a large number of patients. Eighty-three patients with CubTS who underwent surgeries were enrolled. The following potential prognostic factors for disease severity were selected according to previous reports: sex, age, type of surgery, disease duration, body mass index, cervical lesion, presence of diabetes mellitus, Workers' Compensation status, preoperative severity, and preoperative electrodiagnostic testing. Postoperative severity of disease was assessed 2 years after surgery by Messina's criteria which is an outcome measure specifically for CubTS. Bivariate analysis was performed to select candidate prognostic factors for multiple linear regression analyses. Multiple logistic regression analysis was conducted to identify the association between postoperative severity and selected prognostic factors. Both bivariate and multiple linear regression analysis revealed only preoperative severity as an independent risk factor for poor prognosis, while other factors did not show any significant association. Although conflicting results exist regarding prognosis of CubTS, this study supports evidence from previous studies and concludes early surgical intervention portends the most favorable prognosis. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1972-01-01
Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Simplified large African carnivore density estimators from track indices.
Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J
2016-01-01
The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y = αx + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.
Multivariate meta-analysis for non-linear and other multi-parameter associations
Gasparrini, A; Armstrong, B; Kenward, M G
2012-01-01
In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043
ERIC Educational Resources Information Center
Bates, Reid A.; Holton, Elwood F., III; Burnett, Michael F.
1999-01-01
A case study of learning transfer demonstrates the possible effect of influential observation on linear regression analysis. A diagnostic method that tests for violation of assumptions, multicollinearity, and individual and multiple influential observations helps determine which observation to delete to eliminate bias. (SK)
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions
Fernandes, Bruno J. T.; Roque, Alexandre
2018-01-01
Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366
Analysis of Learning Curve Fitting Techniques.
1987-09-01
1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied
Assessing risk factors for periodontitis using regression
NASA Astrophysics Data System (ADS)
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
2013-10-01
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Classification of sodium MRI data of cartilage using machine learning.
Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R
2015-11-01
To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.
Optimizing methods for linking cinematic features to fMRI data.
Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia
2015-04-15
One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data
Ying, Gui-shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard
2017-01-01
Purpose To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. Methods We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field data in the elderly. Results When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI −0.03 to 0.32D, P=0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, P=0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller P-values, while analysis of the worse eye provided larger P-values than mixed effects models and marginal models. Conclusion In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision. PMID:28102741
NASA Technical Reports Server (NTRS)
Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.
1998-01-01
The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.
The Use of Linear Programming for Prediction.
ERIC Educational Resources Information Center
Schnittjer, Carl J.
The purpose of the study was to develop a linear programming model to be used for prediction, test the accuracy of the predictions, and compare the accuracy with that produced by curvilinear multiple regression analysis. (Author)
NASA Astrophysics Data System (ADS)
Jiang, Weiping; Ma, Jun; Li, Zhao; Zhou, Xiaohui; Zhou, Boye
2018-05-01
The analysis of the correlations between the noise in different components of GPS stations has positive significance to those trying to obtain more accurate uncertainty of velocity with respect to station motion. Previous research into noise in GPS position time series focused mainly on single component evaluation, which affects the acquisition of precise station positions, the velocity field, and its uncertainty. In this study, before and after removing the common-mode error (CME), we performed one-dimensional linear regression analysis of the noise amplitude vectors in different components of 126 GPS stations with a combination of white noise, flicker noise, and random walking noise in Southern California. The results show that, on the one hand, there are above-moderate degrees of correlation between the white noise amplitude vectors in all components of the stations before and after removal of the CME, while the correlations between flicker noise amplitude vectors in horizontal and vertical components are enhanced from un-correlated to moderately correlated by removing the CME. On the other hand, the significance tests show that, all of the obtained linear regression equations, which represent a unique function of the noise amplitude in any two components, are of practical value after removing the CME. According to the noise amplitude estimates in two components and the linear regression equations, more accurate noise amplitudes can be acquired in the two components.
NASA Astrophysics Data System (ADS)
Kuchar, A.; Sacha, P.; Miksovsky, J.; Pisoft, P.
2015-06-01
This study focusses on the variability of temperature, ozone and circulation characteristics in the stratosphere and lower mesosphere with regard to the influence of the 11-year solar cycle. It is based on attribution analysis using multiple nonlinear techniques (support vector regression, neural networks) besides the multiple linear regression approach. The analysis was applied to several current reanalysis data sets for the 1979-2013 period, including MERRA, ERA-Interim and JRA-55, with the aim to compare how these types of data resolve especially the double-peaked solar response in temperature and ozone variables and the consequent changes induced by these anomalies. Equatorial temperature signals in the tropical stratosphere were found to be in qualitative agreement with previous attribution studies, although the agreement with observational results was incomplete, especially for JRA-55. The analysis also pointed to the solar signal in the ozone data sets (i.e. MERRA and ERA-Interim) not being consistent with the observed double-peaked ozone anomaly extracted from satellite measurements. The results obtained by linear regression were confirmed by the nonlinear approach through all data sets, suggesting that linear regression is a relevant tool to sufficiently resolve the solar signal in the middle atmosphere. The seasonal evolution of the solar response was also discussed in terms of dynamical causalities in the winter hemispheres. The hypothetical mechanism of a weaker Brewer-Dobson circulation at solar maxima was reviewed together with a discussion of polar vortex behaviour.
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Tan, Kok Chooi; Lim, Hwee San; Matjafri, Mohd Zubir; Abdullah, Khiruddin
2012-06-01
Atmospheric corrections for multi-temporal optical satellite images are necessary, especially in change detection analyses, such as normalized difference vegetation index (NDVI) rationing. Abrupt change detection analysis using remote-sensing techniques requires radiometric congruity and atmospheric correction to monitor terrestrial surfaces over time. Two atmospheric correction methods were used for this study: relative radiometric normalization and the simplified method for atmospheric correction (SMAC) in the solar spectrum. A multi-temporal data set consisting of two sets of Landsat images from the period between 1991 and 2002 of Penang Island, Malaysia, was used to compare NDVI maps, which were generated using the proposed atmospheric correction methods. Land surface temperature (LST) was retrieved using ATCOR3_T in PCI Geomatica 10.1 image processing software. Linear regression analysis was utilized to analyze the relationship between NDVI and LST. This study reveals that both of the proposed atmospheric correction methods yielded high accuracy through examination of the linear correlation coefficients. To check for the accuracy of the equation obtained through linear regression analysis for every single satellite image, 20 points were randomly chosen. The results showed that the SMAC method yielded a constant value (in terms of error) to predict the NDVI value from linear regression analysis-derived equation. The errors (average) from both proposed atmospheric correction methods were less than 10%.
Kim, Seong-Gil
2018-01-01
Background The purpose of this study was to investigate the effect of ankle ROM and lower-extremity muscle strength on static balance control ability in young adults. Material/Methods This study was conducted with 65 young adults, but 10 young adults dropped out during the measurement, so 55 young adults (male: 19, female: 36) completed the study. Postural sway (length and velocity) was measured with eyes open and closed, and ankle ROM (AROM and PROM of dorsiflexion and plantarflexion) and lower-extremity muscle strength (flexor and extensor of hip, knee, and ankle joint) were measured. Pearson correlation coefficient was used to examine the correlation between variables and static balance ability. Simple linear regression analysis and multiple linear regression analysis were used to examine the effect of variables on static balance ability. Results In correlation analysis, plantarflexion ROM (AROM and PROM) and lower-extremity muscle strength (except hip extensor) were significantly correlated with postural sway (p<0.05). In simple correlation analysis, all variables that passed the correlation analysis procedure had significant influence (p<0.05). In multiple linear regression analysis, plantar flexion PROM with eyes open significantly influenced sway length (B=0.681) and sway velocity (B=0.011). Conclusions Lower-extremity muscle strength and ankle plantarflexion ROM influenced static balance control ability, with ankle plantarflexion PROM showing the greatest influence. Therefore, both contractile structures and non-contractile structures should be of interest when considering static balance control ability improvement. PMID:29760375
Kim, Seong-Gil; Kim, Wan-Soo
2018-05-15
BACKGROUND The purpose of this study was to investigate the effect of ankle ROM and lower-extremity muscle strength on static balance control ability in young adults. MATERIAL AND METHODS This study was conducted with 65 young adults, but 10 young adults dropped out during the measurement, so 55 young adults (male: 19, female: 36) completed the study. Postural sway (length and velocity) was measured with eyes open and closed, and ankle ROM (AROM and PROM of dorsiflexion and plantarflexion) and lower-extremity muscle strength (flexor and extensor of hip, knee, and ankle joint) were measured. Pearson correlation coefficient was used to examine the correlation between variables and static balance ability. Simple linear regression analysis and multiple linear regression analysis were used to examine the effect of variables on static balance ability. RESULTS In correlation analysis, plantarflexion ROM (AROM and PROM) and lower-extremity muscle strength (except hip extensor) were significantly correlated with postural sway (p<0.05). In simple correlation analysis, all variables that passed the correlation analysis procedure had significant influence (p<0.05). In multiple linear regression analysis, plantar flexion PROM with eyes open significantly influenced sway length (B=0.681) and sway velocity (B=0.011). CONCLUSIONS Lower-extremity muscle strength and ankle plantarflexion ROM influenced static balance control ability, with ankle plantarflexion PROM showing the greatest influence. Therefore, both contractile structures and non-contractile structures should be of interest when considering static balance control ability improvement.
Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.
2009-01-01
Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358
Reliability Analysis of the Gradual Degradation of Semiconductor Devices.
1983-07-20
under the heading of linear models or linear statistical models . 3 ,4 We have not used this material in this report. Assuming catastrophic failure when...assuming a catastrophic model . In this treatment we first modify our system loss formula and then proceed to the actual analysis. II. ANALYSIS OF...Failure Time 1 Ti Ti 2 T2 T2 n Tn n and are easily analyzed by simple linear regression. Since we have assumed a log normal/Arrhenius activation
Wavelet regression model in forecasting crude oil price
NASA Astrophysics Data System (ADS)
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
TG study of the Li0.4Fe2.4Zn0.2O4 ferrite synthesis
NASA Astrophysics Data System (ADS)
Lysenko, E. N.; Nikolaev, E. V.; Surzhikov, A. P.
2016-02-01
In this paper, the kinetic analysis of Li-Zn ferrite synthesis was studied using thermogravimetry (TG) method through the simultaneous application of non-linear regression to several measurements run at different heating rates (multivariate non-linear regression). Using TG-curves obtained for the four heating rates and Netzsch Thermokinetics software package, the kinetic models with minimal adjustable parameters were selected to quantitatively describe the reaction of Li-Zn ferrite synthesis. It was shown that the experimental TG-curves clearly suggest a two-step process for the ferrite synthesis and therefore a model-fitting kinetic analysis based on multivariate non-linear regressions was conducted. The complex reaction was described by a two-step reaction scheme consisting of sequential reaction steps. It is established that the best results were obtained using the Yander three-dimensional diffusion model at the first stage and Ginstling-Bronstein model at the second step. The kinetic parameters for lithium-zinc ferrite synthesis reaction were found and discussed.
Two Paradoxes in Linear Regression Analysis.
Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong
2016-12-25
Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.
NASA Astrophysics Data System (ADS)
Gusriani, N.; Firdaniza
2018-03-01
The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
Interpreting Regression Results: beta Weights and Structure Coefficients are Both Important.
ERIC Educational Resources Information Center
Thompson, Bruce
Various realizations have led to less frequent use of the "OVA" methods (analysis of variance--ANOVA--among others) and to more frequent use of general linear model approaches such as regression. However, too few researchers understand all the various coefficients produced in regression. This paper explains these coefficients and their…
Quantile Regression in the Study of Developmental Sciences
ERIC Educational Resources Information Center
Petscher, Yaacov; Logan, Jessica A. R.
2014-01-01
Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…
Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi
2007-10-01
Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.
Zhou, Qing-he; Xiao, Wang-pin; Shen, Ying-yan
2014-07-01
The spread of spinal anesthesia is highly unpredictable. In patients with increased abdominal girth and short stature, a greater cephalad spread after a fixed amount of subarachnoidally administered plain bupivacaine is often observed. We hypothesized that there is a strong correlation between abdominal girth/vertebral column length and cephalad spread. Age, weight, height, body mass index, abdominal girth, and vertebral column length were recorded for 114 patients. The L3-L4 interspace was entered, and 3 mL of 0.5% plain bupivacaine was injected into the subarachnoid space. The cephalad spread (loss of temperature sensation and loss of pinprick discrimination) was assessed 30 minutes after intrathecal injection. Linear regression analysis was performed for age, weight, height, body mass index, abdominal girth, vertebral column length, and the spread of spinal anesthesia, and the combined linear contribution of age up to 55 years, weight, height, abdominal girth, and vertebral column length was tested by multiple regression analysis. Linear regression analysis showed that there was a significant univariate correlation among all 6 patient characteristics evaluated and the spread of spinal anesthesia (all P < 0.039) except for age and loss of temperature sensation (P > 0.068). Multiple regression analysis showed that abdominal girth and the vertebral column length were the key determinants for spinal anesthesia spread (both P < 0.0001), whereas age, weight, and height could be omitted without changing the results (all P > 0.059, all 95% confidence limits < 0.372). Multiple regression analysis revealed that the combination of a patient's 5 general characteristics, especially abdominal girth and vertebral column length, had a high predictive value for the spread of spinal anesthesia after a given dose of plain bupivacaine.
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Estimation of standard liver volume in Chinese adult living donors.
Fu-Gui, L; Lu-Nan, Y; Bo, L; Yong, Z; Tian-Fu, W; Ming-Qing, X; Wen-Tao, W; Zhe-Yu, C
2009-12-01
To determine a formula predicting the standard liver volume based on body surface area (BSA) or body weight in Chinese adults. A total of 115 consecutive right-lobe living donors not including the middle hepatic vein underwent right hemi-hepatectomy. No organs were used from prisoners, and no subjects were prisoners. Donor anthropometric data including age, gender, body weight, and body height were recorded prospectively. The weights and volumes of the right lobe liver grafts were measured at the back table. Liver weights and volumes were calculated from the right lobe graft weight and volume obtained at the back table, divided by the proportion of the right lobe on computed tomography. By simple linear regression analysis and stepwise multiple linear regression analysis, we correlated calculated liver volume and body height, body weight, or body surface area. The subjects had a mean age of 35.97 +/- 9.6 years, and a female-to-male ratio of 60:55. The mean volume of the right lobe was 727.47 +/- 136.17 mL, occupying 55.59% +/- 6.70% of the whole liver by computed tomography. The volume of the right lobe was 581.73 +/- 96.137 mL, and the estimated liver volume was 1053.08 +/- 167.56 mL. Females of the same body weight showed a slightly lower liver weight. By simple linear regression analysis and stepwise multiple linear regression analysis, a formula was derived based on body weight. All formulae except the Hong Kong formula overestimated liver volume compared to this formula. The formula of standard liver volume, SLV (mL) = 11.508 x body weight (kg) + 334.024, may be applied to estimate liver volumes in Chinese adults.
Koper, Olga Martyna; Kamińska, Joanna; Milewska, Anna; Sawicki, Karol; Mariak, Zenon; Kemona, Halina; Matowicka-Karna, Joanna
2018-05-18
The influence of isoform A of reticulon-4 (Nogo-A), also known as neurite outgrowth inhibitor, on primary brain tumor development was reported. Therefore the aim was the evaluation of Nogo-A concentrations in cerebrospinal fluid (CSF) and serum of brain tumor patients compared with non-tumoral individuals. All serum results, except for two cases, obtained both in brain tumors and non-tumoral individuals, were below the lower limit of ELISA detection. Cerebrospinal fluid Nogo-A concentrations were significantly lower in primary brain tumor patients compared to non-tumoral individuals. The univariate linear regression analysis found that if white blood cell count increases by 1 × 10 3 /μL, the mean cerebrospinal fluid Nogo-A concentration value decreases 1.12 times. In the model of multiple linear regression analysis predictor variables influencing cerebrospinal fluid Nogo-A concentrations included: diagnosis, sex, and sodium level. The mean cerebrospinal fluid Nogo-A concentration value was 1.9 times higher for women in comparison to men. In the astrocytic brain tumor group higher sodium level occurs with lower cerebrospinal fluid Nogo-A concentrations. We found the opposite situation in non-tumoral individuals. Univariate linear regression analysis revealed, that cerebrospinal fluid Nogo-A concentrations change in relation to white blood cell count. In the created model of multiple linear regression analysis we found, that within predictor variables influencing CSF Nogo-A concentrations were diagnosis, sex, and sodium level. Results may be relevant to the search for cerebrospinal fluid biomarkers and potential therapeutic targets in primary brain tumor patients. Nogo-A concentrations were tested by means of enzyme-linked immunosorbent assay (ELISA).
A New SEYHAN's Approach in Case of Heterogeneity of Regression Slopes in ANCOVA.
Ankarali, Handan; Cangur, Sengul; Ankarali, Seyit
2018-06-01
In this study, when the assumptions of linearity and homogeneity of regression slopes of conventional ANCOVA are not met, a new approach named as SEYHAN has been suggested to use conventional ANCOVA instead of robust or nonlinear ANCOVA. The proposed SEYHAN's approach involves transformation of continuous covariate into categorical structure when the relationship between covariate and dependent variable is nonlinear and the regression slopes are not homogenous. A simulated data set was used to explain SEYHAN's approach. In this approach, we performed conventional ANCOVA in each subgroup which is constituted according to knot values and analysis of variance with two-factor model after MARS method was used for categorization of covariate. The first model is a simpler model than the second model that includes interaction term. Since the model with interaction effect has more subjects, the power of test also increases and the existing significant difference is revealed better. We can say that linearity and homogeneity of regression slopes are not problem for data analysis by conventional linear ANCOVA model by helping this approach. It can be used fast and efficiently for the presence of one or more covariates.
Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.
2009-01-01
In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.
Two Paradoxes in Linear Regression Analysis
FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong
2016-01-01
Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc
2015-08-01
The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
NASA Technical Reports Server (NTRS)
1980-01-01
MATHPAC image-analysis library is collection of general-purpose mathematical and statistical routines and special-purpose data-analysis and pattern-recognition routines for image analysis. MATHPAC library consists of Linear Algebra, Optimization, Statistical-Summary, Densities and Distribution, Regression, and Statistical-Test packages.
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
Marrero-Ponce, Yovani; Medina-Marrero, Ricardo; Castillo-Garit, Juan A; Romero-Zaldivar, Vicente; Torrens, Francisco; Castro, Eduardo A
2005-04-15
A novel approach to bio-macromolecular design from a linear algebra point of view is introduced. A protein's total (whole protein) and local (one or more amino acid) linear indices are a new set of bio-macromolecular descriptors of relevance to protein QSAR/QSPR studies. These amino-acid level biochemical descriptors are based on the calculation of linear maps on Rn[f k(xmi):Rn-->Rn] in canonical basis. These bio-macromolecular indices are calculated from the kth power of the macromolecular pseudograph alpha-carbon atom adjacency matrix. Total linear indices are linear functional on Rn. That is, the kth total linear indices are linear maps from Rn to the scalar R[f k(xm):Rn-->R]. Thus, the kth total linear indices are calculated by summing the amino-acid linear indices of all amino acids in the protein molecule. A study of the protein stability effects for a complete set of alanine substitutions in the Arc repressor illustrates this approach. A quantitative model that discriminates near wild-type stability alanine mutants from the reduced-stability ones in a training series was obtained. This model permitted the correct classification of 97.56% (40/41) and 91.67% (11/12) of proteins in the training and test set, respectively. It shows a high Matthews correlation coefficient (MCC=0.952) for the training set and an MCC=0.837 for the external prediction set. Additionally, canonical regression analysis corroborated the statistical quality of the classification model (Rcanc=0.824). This analysis was also used to compute biological stability canonical scores for each Arc alanine mutant. On the other hand, the linear piecewise regression model compared favorably with respect to the linear regression one on predicting the melting temperature (tm) of the Arc alanine mutants. The linear model explains almost 81% of the variance of the experimental tm (R=0.90 and s=4.29) and the LOO press statistics evidenced its predictive ability (q2=0.72 and scv=4.79). Moreover, the TOMOCOMD-CAMPS method produced a linear piecewise regression (R=0.97) between protein backbone descriptors and tm values for alanine mutants of the Arc repressor. A break-point value of 51.87 degrees C characterized two mutant clusters and coincided perfectly with the experimental scale. For this reason, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutant Arc homodimers. These models also permitted the interpretation of the driving forces of such folding process, indicating that topologic/topographic protein backbone interactions control the stability profile of wild-type Arc and its alanine mutants.
1981-09-01
corresponds to the same square footage that consumed the electrical energy. 3. The basic assumptions of multiple linear regres- sion, as enumerated in...7. Data related to the sample of bases is assumed to be representative of bases in the population. Limitations Basic limitations on this research were... Ratemaking --Overview. Rand Report R-5894, Santa Monica CA, May 1977. Chatterjee, Samprit, and Bertram Price. Regression Analysis by Example. New York: John
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seong W. Lee
2004-10-01
The systematic tests of the gasifier simulator on the clean thermocouple were completed in this reporting period. Within the systematic tests on the clean thermocouple, five (5) factors were considered as the experimental parameters including air flow rate, water flow rate, fine dust particle amount, ammonia addition and high/low frequency device (electric motor). The fractional factorial design method was used in the experiment design with sixteen (16) data sets of readings. Analysis of Variances (ANOVA) was applied to the results from systematic tests. The ANOVA results show that the un-balanced motor vibration frequency did not have the significant impact onmore » the temperature changes in the gasifier simulator. For the fine dust particles testing, the amount of fine dust particles has significant impact to the temperature measurements in the gasifier simulator. The effects of the air and water on the temperature measurements show the same results as reported in the previous report. The ammonia concentration was included as an experimental parameter for the reducing environment in this reporting period. The ammonia concentration does not seem to be a significant factor on the temperature changes. The linear regression analysis was applied to the temperature reading with five (5) factors. The accuracy of the linear regression is relatively low, which is less than 10% accuracy. Nonlinear regression was also conducted to the temperature reading with the same factors. Since the experiments were designed in two (2) levels, the nonlinear regression is not very effective with the dataset (16 readings). An extra central point test was conducted. With the data of the center point testing, the accuracy of the nonlinear regression is much better than the linear regression.« less
NASA Astrophysics Data System (ADS)
Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried
2018-03-01
This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease ( P < 0.001) in rice yield, pan evaporation, solar radiation, and wind speed declined significantly. Eight principal components exhibited an eigenvalue > 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.
NASA Technical Reports Server (NTRS)
Barrett, C. A.
1985-01-01
Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.
Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
Statistical approach to the analysis of olive long-term pollen season trends in southern Spain.
García-Mozo, H; Yaezel, L; Oteros, J; Galán, C
2014-03-01
Analysis of long-term airborne pollen counts makes it possible not only to chart pollen-season trends but also to track changing patterns in flowering phenology. Changes in higher plant response over a long interval are considered among the most valuable bioindicators of climate change impact. Phenological-trend models can also provide information regarding crop production and pollen-allergen emission. The interest of this information makes essential the election of the statistical analysis for time series study. We analysed trends and variations in the olive flowering season over a 30-year period (1982-2011) in southern Europe (Córdoba, Spain), focussing on: annual Pollen Index (PI); Pollen Season Start (PSS), Peak Date (PD), Pollen Season End (PSE) and Pollen Season Duration (PSD). Apart from the traditional Linear Regression analysis, a Seasonal-Trend Decomposition procedure based on Loess (STL) and an ARIMA model were performed. Linear regression results indicated a trend toward delayed PSE and earlier PSS and PD, probably influenced by the rise in temperature. These changes are provoking longer flowering periods in the study area. The use of the STL technique provided a clearer picture of phenological behaviour. Data decomposition on pollination dynamics enabled the trend toward an alternate bearing cycle to be distinguished from the influence of other stochastic fluctuations. Results pointed to show a rising trend in pollen production. With a view toward forecasting future phenological trends, ARIMA models were constructed to predict PSD, PSS and PI until 2016. Projections displayed a better goodness of fit than those derived from linear regression. Findings suggest that olive reproductive cycle is changing considerably over the last 30years due to climate change. Further conclusions are that STL improves the effectiveness of traditional linear regression in trend analysis, and ARIMA models can provide reliable trend projections for future years taking into account the internal fluctuations in time series. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
de Andrés, Javier; Landajo, Manuel; Lorca, Pedro; Labra, Jose; Ordóñez, Patricia
Artificial neural networks have proven to be useful tools for solving financial analysis problems such as financial distress prediction and audit risk assessment. In this paper we focus on the performance of robust (least absolute deviation-based) neural networks on measuring liquidity of firms. The problem of learning the bivariate relationship between the components (namely, current liabilities and current assets) of the so-called current ratio is analyzed, and the predictive performance of several modelling paradigms (namely, linear and log-linear regressions, classical ratios and neural networks) is compared. An empirical analysis is conducted on a representative data base from the Spanish economy. Results indicate that classical ratio models are largely inadequate as a realistic description of the studied relationship, especially when used for predictive purposes. In a number of cases, especially when the analyzed firms are microenterprises, the linear specification is improved by considering the flexible non-linear structures provided by neural networks.
Linear regression metamodeling as a tool to summarize and present simulation model results.
Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M
2013-10-01
Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
Bias due to two-stage residual-outcome regression analysis in genetic association studies.
Demissie, Serkalem; Cupples, L Adrienne
2011-11-01
Association studies of risk factors and complex diseases require careful assessment of potential confounding factors. Two-stage regression analysis, sometimes referred to as residual- or adjusted-outcome analysis, has been increasingly used in association studies of single nucleotide polymorphisms (SNPs) and quantitative traits. In this analysis, first, a residual-outcome is calculated from a regression of the outcome variable on covariates and then the relationship between the adjusted-outcome and the SNP is evaluated by a simple linear regression of the adjusted-outcome on the SNP. In this article, we examine the performance of this two-stage analysis as compared with multiple linear regression (MLR) analysis. Our findings show that when a SNP and a covariate are correlated, the two-stage approach results in biased genotypic effect and loss of power. Bias is always toward the null and increases with the squared-correlation between the SNP and the covariate (). For example, for , 0.1, and 0.5, two-stage analysis results in, respectively, 0, 10, and 50% attenuation in the SNP effect. As expected, MLR was always unbiased. Since individual SNPs often show little or no correlation with covariates, a two-stage analysis is expected to perform as well as MLR in many genetic studies; however, it produces considerably different results from MLR and may lead to incorrect conclusions when independent variables are highly correlated. While a useful alternative to MLR under , the two -stage approach has serious limitations. Its use as a simple substitute for MLR should be avoided. © 2011 Wiley Periodicals, Inc.
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim
2014-01-01
Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Investigating the detection of multi-homed devices independent of operating systems
2017-09-01
timestamp data was used to estimate clock skews using linear regression and linear optimization methods. Analysis revealed that detection depends on...the consistency of the estimated clock skew. Through vertical testing, it was also shown that clock skew consistency depends on the installed...optimization methods. Analysis revealed that detection depends on the consistency of the estimated clock skew. Through vertical testing, it was also
NASA Astrophysics Data System (ADS)
Chu, Hone-Jay; Kong, Shish-Jeng; Chang, Chih-Hua
2018-03-01
The turbidity (TB) of a water body varies with time and space. Water quality is traditionally estimated via linear regression based on satellite images. However, estimating and mapping water quality require a spatio-temporal nonstationary model, while TB mapping necessitates the use of geographically and temporally weighted regression (GTWR) and geographically weighted regression (GWR) models, both of which are more precise than linear regression. Given the temporal nonstationary models for mapping water quality, GTWR offers the best option for estimating regional water quality. Compared with GWR, GTWR provides highly reliable information for water quality mapping, boasts a relatively high goodness of fit, improves the explanation of variance from 44% to 87%, and shows a sufficient space-time explanatory power. The seasonal patterns of TB and the main spatial patterns of TB variability can be identified using the estimated TB maps from GTWR and by conducting an empirical orthogonal function (EOF) analysis.
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Weighted functional linear regression models for gene-based association analysis.
Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I
2018-01-01
Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
How is the weather? Forecasting inpatient glycemic control
Saulnier, George E; Castro, Janna C; Cook, Curtiss B; Thompson, Bithika M
2017-01-01
Aim: Apply methods of damped trend analysis to forecast inpatient glycemic control. Method: Observed and calculated point-of-care blood glucose data trends were determined over 62 weeks. Mean absolute percent error was used to calculate differences between observed and forecasted values. Comparisons were drawn between model results and linear regression forecasting. Results: The forecasted mean glucose trends observed during the first 24 and 48 weeks of projections compared favorably to the results provided by linear regression forecasting. However, in some scenarios, the damped trend method changed inferences compared with linear regression. In all scenarios, mean absolute percent error values remained below the 10% accepted by demand industries. Conclusion: Results indicate that forecasting methods historically applied within demand industries can project future inpatient glycemic control. Additional study is needed to determine if forecasting is useful in the analyses of other glucometric parameters and, if so, how to apply the techniques to quality improvement. PMID:29134125
Improving power and robustness for detecting genetic association with extreme-value sampling design.
Chen, Hua Yun; Li, Mingyao
2011-12-01
Extreme-value sampling design that samples subjects with extremely large or small quantitative trait values is commonly used in genetic association studies. Samples in such designs are often treated as "cases" and "controls" and analyzed using logistic regression. Such a case-control analysis ignores the potential dose-response relationship between the quantitative trait and the underlying trait locus and thus may lead to loss of power in detecting genetic association. An alternative approach to analyzing such data is to model the dose-response relationship by a linear regression model. However, parameter estimation from this model can be biased, which may lead to inflated type I errors. We propose a robust and efficient approach that takes into consideration of both the biased sampling design and the potential dose-response relationship. Extensive simulations demonstrate that the proposed method is more powerful than the traditional logistic regression analysis and is more robust than the linear regression analysis. We applied our method to the analysis of a candidate gene association study on high-density lipoprotein cholesterol (HDL-C) which includes study subjects with extremely high or low HDL-C levels. Using our method, we identified several SNPs showing a stronger evidence of association with HDL-C than the traditional case-control logistic regression analysis. Our results suggest that it is important to appropriately model the quantitative traits and to adjust for the biased sampling when dose-response relationship exists in extreme-value sampling designs. © 2011 Wiley Periodicals, Inc.
do Prado, Mara Rúbia Maciel Cardoso; Oliveira, Fabiana de Cássia Carvalho; Assis, Karine Franklin; Ribeiro, Sarah Aparecida Vieira; do Prado, Pedro Paulo; Sant'Ana, Luciana Ferreira da Rocha; Priore, Silvia Eloiza; Franceschini, Sylvia do Carmo Castro
2015-01-01
Abstract Objective: To assess the prevalence of vitamin D deficiency and its associated factors in women and their newborns in the postpartum period. Methods: This cross-sectional study evaluated vitamin D deficiency/insufficiency in 226 women and their newborns in Viçosa (Minas Gerais, BR) between December 2011 and November 2012. Cord blood and venous maternal blood were collected to evaluate the following biochemical parameters: vitamin D, alkaline phosphatase, calcium, phosphorus and parathyroid hormone. Poisson regression analysis, with a confidence interval of 95%, was applied to assess vitamin D deficiency and its associated factors. Multiple linear regression analysis was performed to identify factors associated with 25(OH)D deficiency in the newborns and women from the study. The criteria for variable inclusion in the multiple linear regression model was the association with the dependent variable in the simple linear regression analysis, considering p<0.20. Significance level was α <5%. Results: From 226 women included, 200 (88.5%) were 20-44 years old; the median age was 28 years. Deficient/insufficient levels of vitamin D were found in 192 (85%) women and in 182 (80.5%) neonates. The maternal 25(OH)D and alkaline phosphatase levels were independently associated with vitamin D deficiency in infants. Conclusions: This study identified a high prevalence of vitamin D deficiency and insufficiency in women and newborns and the association between maternal nutritional status of vitamin D and their infants' vitamin D status. PMID:26100593
Solar cycle in current reanalyses: (non)linear attribution study
NASA Astrophysics Data System (ADS)
Kuchar, A.; Sacha, P.; Miksovsky, J.; Pisoft, P.
2014-12-01
This study focusses on the variability of temperature, ozone and circulation characteristics in the stratosphere and lower mesosphere with regard to the influence of the 11 year solar cycle. It is based on attribution analysis using multiple nonlinear techniques (Support Vector Regression, Neural Networks) besides the traditional linear approach. The analysis was applied to several current reanalysis datasets for the 1979-2013 period, including MERRA, ERA-Interim and JRA-55, with the aim to compare how this type of data resolves especially the double-peaked solar response in temperature and ozone variables and the consequent changes induced by these anomalies. Equatorial temperature signals in the lower and upper stratosphere were found to be sufficiently robust and in qualitative agreement with previous observational studies. The analysis also pointed to the solar signal in the ozone datasets (i.e. MERRA and ERA-Interim) not being consistent with the observed double-peaked ozone anomaly extracted from satellite measurements. Consequently the results obtained by linear regression were confirmed by the nonlinear approach through all datasets, suggesting that linear regression is a relevant tool to sufficiently resolve the solar signal in the middle atmosphere. Furthermore, the seasonal dependence of the solar response was also discussed, mainly as a source of dynamical causalities in the wave propagation characteristics in the zonal wind and the induced meridional circulation in the winter hemispheres. The hypothetical mechanism of a weaker Brewer Dobson circulation was reviewed together with discussion of polar vortex stability.
Application of linear regression analysis in accuracy assessment of rolling force calculations
NASA Astrophysics Data System (ADS)
Poliak, E. I.; Shim, M. K.; Kim, G. S.; Choo, W. Y.
1998-10-01
Efficient operation of the computational models employed in process control systems require periodical assessment of the accuracy of their predictions. Linear regression is proposed as a tool which allows separate systematic and random prediction errors from those related to measurements. A quantitative characteristic of the model predictive ability is introduced in addition to standard statistical tests for model adequacy. Rolling force calculations are considered as an example for the application. However, the outlined approach can be used to assess the performance of any computational model.
On the equivalence of case-crossover and time series methods in environmental epidemiology.
Lu, Yun; Zeger, Scott L
2007-04-01
The case-crossover design was introduced in epidemiology 15 years ago as a method for studying the effects of a risk factor on a health event using only cases. The idea is to compare a case's exposure immediately prior to or during the case-defining event with that same person's exposure at otherwise similar "reference" times. An alternative approach to the analysis of daily exposure and case-only data is time series analysis. Here, log-linear regression models express the expected total number of events on each day as a function of the exposure level and potential confounding variables. In time series analyses of air pollution, smooth functions of time and weather are the main confounders. Time series and case-crossover methods are often viewed as competing methods. In this paper, we show that case-crossover using conditional logistic regression is a special case of time series analysis when there is a common exposure such as in air pollution studies. This equivalence provides computational convenience for case-crossover analyses and a better understanding of time series models. Time series log-linear regression accounts for overdispersion of the Poisson variance, while case-crossover analyses typically do not. This equivalence also permits model checking for case-crossover data using standard log-linear model diagnostics.
Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf
2015-01-01
The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200–400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. PMID:25005037
The measurement of linear frequency drift in oscillators
NASA Astrophysics Data System (ADS)
Barnes, J. A.
1985-04-01
A linear drift in frequency is an important element in most stochastic models of oscillator performance. Quartz crystal oscillators often have drifts in excess of a part in ten to the tenth power per day. Even commercial cesium beam devices often show drifts of a few parts in ten to the thirteenth per year. There are many ways to estimate the drift rates from data samples (e.g., regress the phase on a quadratic; regress the frequency on a linear; compute the simple mean of the first difference of frequency; use Kalman filters with a drift term as one element in the state vector; and others). Although most of these estimators are unbiased, they vary in efficiency (i.e., confidence intervals). Further, the estimation of confidence intervals using the standard analysis of variance (typically associated with the specific estimating technique) can give amazingly optimistic results. The source of these problems is not an error in, say, the regressions techniques, but rather the problems arise from correlations within the residuals. That is, the oscillator model is often not consistent with constraints on the analysis technique or, in other words, some specific analysis techniques are often inappropriate for the task at hand. The appropriateness of a specific analysis technique is critically dependent on the oscillator model and can often be checked with a simple whiteness test on the residuals.
Darsazan, Bahar; Shafaati, Alireza; Mortazavi, Seyed Alireza; Zarghi, Afshin
2017-01-01
A simple and reliable stability-indicating RP-HPLC method was developed and validated for analysis of adefovir dipivoxil (ADV).The chromatographic separation was performed on a C 18 column using a mixture of acetonitrile-citrate buffer (10 mM at pH 5.2) 36:64 (%v/v) as mobile phase, at a flow rate of 1.5 mL/min. Detection was carried out at 260 nm and a sharp peak was obtained for ADV at a retention time of 5.8 ± 0.01 min. No interferences were observed from its stress degradation products. The method was validated according to the international guidelines. Linear regression analysis of data for the calibration plot showed a linear relationship between peak area and concentration over the range of 0.5-16 μg/mL; the regression coefficient was 0.9999and the linear regression equation was y = 24844x-2941.3. The detection (LOD) and quantification (LOQ) limits were 0.12 and 0.35 μg/mL, respectively. The results proved the method was fast (analysis time less than 7 min), precise, reproducible, and accurate for analysis of ADV over a wide range of concentration. The proposed specific method was used for routine quantification of ADV in pharmaceutical bulk and a tablet dosage form.
Ludbrook, John
2010-07-01
1. There are two reasons for wanting to compare measurers or methods of measurement. One is to calibrate one method or measurer against another; the other is to detect bias. Fixed bias is present when one method gives higher (or lower) values across the whole range of measurement. Proportional bias is present when one method gives values that diverge progressively from those of the other. 2. Linear regression analysis is a popular method for comparing methods of measurement, but the familiar ordinary least squares (OLS) method is rarely acceptable. The OLS method requires that the x values are fixed by the design of the study, whereas it is usual that both y and x values are free to vary and are subject to error. In this case, special regression techniques must be used. 3. Clinical chemists favour techniques such as major axis regression ('Deming's method'), the Passing-Bablok method or the bivariate least median squares method. Other disciplines, such as allometry, astronomy, biology, econometrics, fisheries research, genetics, geology, physics and sports science, have their own preferences. 4. Many Monte Carlo simulations have been performed to try to decide which technique is best, but the results are almost uninterpretable. 5. I suggest that pharmacologists and physiologists should use ordinary least products regression analysis (geometric mean regression, reduced major axis regression): it is versatile, can be used for calibration or to detect bias and can be executed by hand-held calculator or by using the loss function in popular, general-purpose, statistical software.
NASA Astrophysics Data System (ADS)
Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.
2008-04-01
Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
Trend Analysis Using Microcomputers.
ERIC Educational Resources Information Center
Berger, Carl F.
A trend analysis statistical package and additional programs for the Apple microcomputer are presented. They illustrate strategies of data analysis suitable to the graphics and processing capabilities of the microcomputer. The programs analyze data sets using examples of: (1) analysis of variance with multiple linear regression; (2) exponential…
Lundblad, Runar; Abdelnoor, Michel; Svennevig, Jan Ludvig
2004-09-01
Simple linear resection and endoventricular patch plasty are alternative techniques to repair postinfarction left ventricular aneurysm. The aim of the study was to compare these 2 methods with regard to early mortality and long-term survival. We retrospectively reviewed 159 patients undergoing operations between 1989 and 2003. The epidemiologic design was of an exposed (simple linear repair, n = 74) versus nonexposed (endoventricular patch plasty, n = 85) cohort with 2 endpoints: early mortality and long-term survival. The crude effect of aneurysm repair technique versus endpoint was estimated by odds ratio, rate ratio, or relative risk and their 95% confidence intervals. Stratification analysis by using the Mantel-Haenszel method was done to quantify confounders and pinpoint effect modifiers. Adjustment for multiconfounders was performed by using logistic regression and Cox regression analysis. Survival curves were analyzed with the Breslow test and the log-rank test. Early mortality was 8.2% for all patients, 13.5% after linear repair and 3.5% after endoventricular patch plasty. When adjusted for multiconfounders, the risk of early mortality was significantly higher after simple linear repair than after endoventricular patch plasty (odds ratio, 4.4; 95% confidence interval, 1.1-17.8). Mean follow-up was 5.8 +/- 3.8 years (range, 0-14.0 years). Overall 5-year cumulative survival was 78%, 70.1% after linear repair and 91.4% after endoventricular patch plasty. The risk of total mortality was significantly higher after linear repair than after endoventricular patch plasty when controlled for multiconfounders (relative risk, 4.5; 95% confidence interval, 2.0-9.7). Linear repair dominated early in the series and patch plasty dominated later, giving a possible learning-curve bias in favor of patch plasty that could not be adjusted for in the regression analysis. Postinfarction left ventricular aneurysm can be repaired with satisfactory early and late results. Surgical risk was lower and long-term survival was higher after endoventricular patch plasty than simple linear repair. Differences in outcome should be interpreted with care because of the retrospective study design and the chronology of the 2 repair methods.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Tolerance of ciliated protozoan Paramecium bursaria (Protozoa, Ciliophora) to ammonia and nitrites
NASA Astrophysics Data System (ADS)
Xu, Henglong; Song, Weibo; Lu, Lu; Alan, Warren
2005-09-01
The tolerance to ammonia and nitrites in freshwater ciliate Paramecium bursaria was measured in a conventional open system. The ciliate was exposed to different concentrations of ammonia and nitrites for 2h and 12h in order to determine the lethal concentrations. Linear regression analysis revealed that the 2h-LC50 value for ammonia was 95.94 mg/L and for nitrite 27.35 mg/L using probit scale method (with 95% confidence intervals). There was a linear correlation between the mortality probit scale and logarithmic concentration of ammonia which fit by a regression equation y=7.32 x 9.51 ( R 2=0.98; y, mortality probit scale; x, logarithmic concentration of ammonia), by which 2 h-LC50 value for ammonia was found to be 95.50 mg/L. A linear correlation between mortality probit scales and logarithmic concentration of nitrite is also followed the regression equation y=2.86 x+0.89 ( R 2=0.95; y, mortality probit scale; x, logarithmic concentration of nitrite). The regression analysis of toxicity curves showed that the linear correlation between exposed time of ammonia-N LC50 value and ammonia-N LC50 value followed the regression equation y=2 862.85 e -0.08 x ( R 2=0.95; y, duration of exposure to LC50 value; x, LC50 value), and that between exposed time of nitrite-N LC50 value and nitrite-N LC50 value followed the regression equation y=127.15 e -0.13 x ( R 2=0.91; y, exposed time of LC50 value; x, LC50 value). The results demonstrate that the tolerance to ammonia in P. bursaria is considerably higher than that of the larvae or juveniles of some metozoa, e.g. cultured prawns and oysters. In addition, ciliates, as bacterial predators, are likely to play a positive role in maintaining and improving water quality in aquatic environments with high-level ammonium, such as sewage treatment systems.
Modeling Longitudinal Data Containing Non-Normal Within Subject Errors
NASA Technical Reports Server (NTRS)
Feiveson, Alan; Glenn, Nancy L.
2013-01-01
The mission of the National Aeronautics and Space Administration’s (NASA) human research program is to advance safe human spaceflight. This involves conducting experiments, collecting data, and analyzing data. The data are longitudinal and result from a relatively few number of subjects; typically 10 – 20. A longitudinal study refers to an investigation where participant outcomes and possibly treatments are collected at multiple follow-up times. Standard statistical designs such as mean regression with random effects and mixed–effects regression are inadequate for such data because the population is typically not approximately normally distributed. Hence, more advanced data analysis methods are necessary. This research focuses on four such methods for longitudinal data analysis: the recently proposed linear quantile mixed models (lqmm) by Geraci and Bottai (2013), quantile regression, multilevel mixed–effects linear regression, and robust regression. This research also provides computational algorithms for longitudinal data that scientists can directly use for human spaceflight and other longitudinal data applications, then presents statistical evidence that verifies which method is best for specific situations. This advances the study of longitudinal data in a broad range of applications including applications in the sciences, technology, engineering and mathematics fields.
Kovalska, M P; Bürki, E; Schoetzau, A; Orguel, S F; Orguel, S; Grieshaber, M C
2011-04-01
The distinction of real progression from test variability in visual field (VF) series may be based on clinical judgment, on trend analysis based on follow-up of test parameters over time, or on identification of a significant change related to the mean of baseline exams (event analysis). The aim of this study was to compare a new population-based method (Octopus field analysis, OFA) with classic regression analyses and clinical judgment for detecting glaucomatous VF changes. 240 VF series of 240 patients with at least 9 consecutive examinations available were included into this study. They were independently classified by two experienced investigators. The results of such a classification served as a reference for comparison for the following statistical tests: (a) t-test global, (b) r-test global, (c) regression analysis of 10 VF clusters and (d) point-wise linear regression analysis. 32.5 % of the VF series were classified as progressive by the investigators. The sensitivity and specificity were 89.7 % and 92.0 % for r-test, and 73.1 % and 93.8 % for the t-test, respectively. In the point-wise linear regression analysis, the specificity was comparable (89.5 % versus 92 %), but the sensitivity was clearly lower than in the r-test (22.4 % versus 89.7 %) at a significance level of p = 0.01. A regression analysis for the 10 VF clusters showed a markedly higher sensitivity for the r-test (37.7 %) than the t-test (14.1 %) at a similar specificity (88.3 % versus 93.8 %) for a significant trend (p = 0.005). In regard to the cluster distribution, the paracentral clusters and the superior nasal hemifield progressed most frequently. The population-based regression analysis seems to be superior to the trend analysis in detecting VF progression in glaucoma, and may eliminate the drawbacks of the event analysis. Further, it may assist the clinician in the evaluation of VF series and may allow better visualization of the correlation between function and structure owing to VF clusters. © Georg Thieme Verlag KG Stuttgart · New York.
Quasi-likelihood generalized linear regression analysis of fatality risk data
DOT National Transportation Integrated Search
2009-01-01
Transportation-related fatality risks is a function of many interacting human, vehicle, and environmental factors. Statisitcally valid analysis of such data is challenged both by the complexity of plausable structural models relating fatality rates t...
NASA Technical Reports Server (NTRS)
Roth, D. J.; Swickard, S. M.; Stang, D. B.; Deguire, M. R.
1991-01-01
A review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials is presented. Initially, a semiempirical model is developed showing the origin of the linear relationship between ultrasonic velocity and porosity fraction. Then, from a compilation of data produced by many researchers, scatter plots of velocity versus percent porosity data are shown for Al2O3, MgO, porcelain-based ceramics, PZT, SiC, Si3N4, steel, tungsten, UO2,(U0.30Pu0.70)C, and YBa2Cu3O(7-x). Linear regression analysis produces predicted slope, intercept, correlation coefficient, level of significance, and confidence interval statistics for the data. Velocity values predicted from regression analysis of fully-dense materials are in good agreement with those calculated from elastic properties.
NASA Technical Reports Server (NTRS)
Roth, D. J.; Swickard, S. M.; Stang, D. B.; Deguire, M. R.
1990-01-01
A review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials is presented. Initially, a semi-empirical model is developed showing the origin of the linear relationship between ultrasonic velocity and porosity fraction. Then, from a compilation of data produced by many researchers, scatter plots of velocity versus percent porosity data are shown for Al2O3, MgO, porcelain-based ceramics, PZT, SiC, Si3N4, steel, tungsten, UO2,(U0.30Pu0.70)C, and YBa2Cu3O(7-x). Linear regression analysis produced predicted slope, intercept, correlation coefficient, level of significance, and confidence interval statistics for the data. Velocity values predicted from regression analysis for fully-dense materials are in good agreement with those calculated from elastic properties.
Non-stationary hydrologic frequency analysis using B-spline quantile regression
NASA Astrophysics Data System (ADS)
Nasri, B.; Bouezmarni, T.; St-Hilaire, A.; Ouarda, T. B. M. J.
2017-11-01
Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic and water resources systems under the assumption of stationarity. However, with increasing evidence of climate change, it is possible that the assumption of stationarity, which is prerequisite for traditional frequency analysis and hence, the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extremes based on B-Spline quantile regression which allows to model data in the presence of non-stationarity and/or dependence on covariates with linear and non-linear dependence. A Markov Chain Monte Carlo (MCMC) algorithm was used to estimate quantiles and their posterior distributions. A coefficient of determination and Bayesian information criterion (BIC) for quantile regression are used in order to select the best model, i.e. for each quantile, we choose the degree and number of knots of the adequate B-spline quantile regression model. The method is applied to annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in the variable of interest and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for an annual maximum and minimum discharge with high annual non-exceedance probabilities.
ERIC Educational Resources Information Center
Baylor, Carolyn; Yorkston, Kathryn; Bamer, Alyssa; Britton, Deanna; Amtmann, Dagmar
2010-01-01
Purpose: To explore variables associated with self-reported communicative participation in a sample (n = 498) of community-dwelling adults with multiple sclerosis (MS). Method: A battery of questionnaires was administered online or on paper per participant preference. Data were analyzed using multiple linear backward stepwise regression. The…
Logarithmic Transformations in Regression: Do You Transform Back Correctly?
ERIC Educational Resources Information Center
Dambolena, Ismael G.; Eriksen, Steven E.; Kopcso, David P.
2009-01-01
The logarithmic transformation is often used in regression analysis for a variety of purposes such as the linearization of a nonlinear relationship between two or more variables. We have noticed that when this transformation is applied to the response variable, the computation of the point estimate of the conditional mean of the original response…
Yoneoka, Daisuke; Henmi, Masayuki
2017-06-01
Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Wilson, R. M.; Reichmann, E. J.; Teuber, D. L.
1984-01-01
An empirical method is developed to predict certain parameters of future solar activity cycles. Sunspot cycle statistics are examined, and curve fitting and linear regression analysis techniques are utilized.
Hayashi, K; Yamada, T; Sawa, T
2015-03-01
The return or Poincaré plot is a non-linear analytical approach in a two-dimensional plane, where a timed signal is plotted against itself after a time delay. Its scatter pattern reflects the randomness and variability in the signals. Quantification of a Poincaré plot of the electroencephalogram has potential to determine anaesthesia depth. We quantified the degree of dispersion (i.e. standard deviation, SD) along the diagonal line of the electroencephalogram-Poincaré plot (named as SD1/SD2), and compared SD1/SD2 values with spectral edge frequency 95 (SEF95) and bispectral index values. The regression analysis showed a tight linear regression equation with a coefficient of determination (R(2) ) value of 0.904 (p < 0.0001) between the Poincaré index (SD1/SD2) and SEF95, and a moderate linear regression equation between SD1/SD2 and bispectral index (R(2) = 0.346, p < 0.0001). Quantification of the Poincaré plot tightly correlates with SEF95, reflecting anaesthesia-dependent changes in electroencephalogram oscillation. © 2014 The Association of Anaesthetists of Great Britain and Ireland.
Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case
NASA Astrophysics Data System (ADS)
Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann
2017-04-01
Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.
Mansilha, C; Melo, A; Rebelo, H; Ferreira, I M P L V O; Pinho, O; Domingues, V; Pinho, C; Gameiro, P
2010-10-22
A multi-residue methodology based on a solid phase extraction followed by gas chromatography-tandem mass spectrometry was developed for trace analysis of 32 compounds in water matrices, including estrogens and several pesticides from different chemical families, some of them with endocrine disrupting properties. Matrix standard calibration solutions were prepared by adding known amounts of the analytes to a residue-free sample to compensate matrix-induced chromatographic response enhancement observed for certain pesticides. Validation was done mainly according to the International Conference on Harmonisation recommendations, as well as some European and American validation guidelines with specifications for pesticides analysis and/or GC-MS methodology. As the assumption of homoscedasticity was not met for analytical data, weighted least squares linear regression procedure was applied as a simple and effective way to counteract the greater influence of the greater concentrations on the fitted regression line, improving accuracy at the lower end of the calibration curve. The method was considered validated for 31 compounds after consistent evaluation of the key analytical parameters: specificity, linearity, limit of detection and quantification, range, precision, accuracy, extraction efficiency, stability and robustness. Copyright © 2010 Elsevier B.V. All rights reserved.
Huang, Jian; Zhang, Cun-Hui
2013-01-01
The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
NASA Astrophysics Data System (ADS)
Wang, Hongliang; Liu, Baohua; Ding, Zhongjun; Wang, Xiangxin
2017-02-01
Absorption-based optical sensors have been developed for the determination of water pH. In this paper, based on the preparation of a transparent sol-gel thin film with a phenol red (PR) indicator, several calculation methods, including simple linear regression analysis, quadratic regression analysis and dual-wavelength absorbance ratio analysis, were used to calculate water pH. Results of MSSRR show that dual-wavelength absorbance ratio analysis can improve the calculation accuracy of water pH in long-term measurement.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Modeling containment of large wildfires using generalized linear mixed-model analysis
Mark Finney; Isaac C. Grenfell; Charles W. McHugh
2009-01-01
Billions of dollars are spent annually in the United States to contain large wildland fires, but the factors contributing to suppression success remain poorly understood. We used a regression model (generalized linear mixed-model) to model containment probability of individual fires, assuming that containment was a repeated-measures problem (fixed effect) and...
Huang, Li-Shan; Myers, Gary J; Davidson, Philip W; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W; Cernichiari, Elsa; Shamlaye, Conrad F; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W
2007-11-01
Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age 9 years. The analyses for the most recent 9-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated non-linearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of 21 endpoints available at age 9 years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other 9-year endpoints that in the linear analysis had a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels.
Protocol Analysis as a Tool in Function and Task Analysis
1999-10-01
Autocontingency The use of log-linear and logistic regression methods to analyse sequential data seems appealing , and is strongly advocated by...collection and analysis of observational data. Behavior Research Methods, Instruments, and Computers, 23(3), 415-429. Patrick, J. D. (1991). Snob : A
ERIC Educational Resources Information Center
Games, Paul A.
1975-01-01
A brief introduction is presented on how multiple regression and linear model techniques can handle data analysis situations that most educators and psychologists think of as appropriate for analysis of variance. (Author/BJG)
Aptel, Florent; Sayous, Romain; Fortoul, Vincent; Beccat, Sylvain; Denis, Philippe
2010-12-01
To evaluate and compare the regional relationships between visual field sensitivity and retinal nerve fiber layer (RNFL) thickness as measured by spectral-domain optical coherence tomography (OCT) and scanning laser polarimetry. Prospective cross-sectional study. One hundred and twenty eyes of 120 patients (40 with healthy eyes, 40 with suspected glaucoma, and 40 with glaucoma) were tested on Cirrus-OCT, GDx VCC, and standard automated perimetry. Raw data on RNFL thickness were extracted for 256 peripapillary sectors of 1.40625 degrees each for the OCT measurement ellipse and 64 peripapillary sectors of 5.625 degrees each for the GDx VCC measurement ellipse. Correlations between peripapillary RNFL thickness in 6 sectors and visual field sensitivity in the 6 corresponding areas were evaluated using linear and logarithmic regression analysis. Receiver operating curve areas were calculated for each instrument. With spectral-domain OCT, the correlations (r(2)) between RNFL thickness and visual field sensitivity ranged from 0.082 (nasal RNFL and corresponding visual field area, linear regression) to 0.726 (supratemporal RNFL and corresponding visual field area, logarithmic regression). By comparison, with GDx-VCC, the correlations ranged from 0.062 (temporal RNFL and corresponding visual field area, linear regression) to 0.362 (supratemporal RNFL and corresponding visual field area, logarithmic regression). In pairwise comparisons, these structure-function correlations were generally stronger with spectral-domain OCT than with GDx VCC and with logarithmic regression than with linear regression. The largest areas under the receiver operating curve were seen for OCT superior thickness (0.963 ± 0.022; P < .001) in eyes with glaucoma and for OCT average thickness (0.888 ± 0.072; P < .001) in eyes with suspected glaucoma. The structure-function relationship was significantly stronger with spectral-domain OCT than with scanning laser polarimetry, and was better expressed logarithmically than linearly. Measurements with these 2 instruments should not be considered to be interchangeable. Copyright © 2010 Elsevier Inc. All rights reserved.
Specific factors for prenatal lead exposure in the border area of China.
Kawata, Kimiko; Li, Yan; Liu, Hao; Zhang, Xiao Qin; Ushijima, Hiroshi
2006-07-01
The objectives of this study are to examine the prevalence of increased blood lead concentrations in mothers and their umbilical cords, and to identify risk factors for prenatal lead exposure in Kunming city, Yunnan province, China. The study was conducted at two obstetrics departments, and 100 peripartum women were enrolled. The mean blood lead concentrations of the mothers and the umbilical cords were 67.3microg/l and 53.1microg/l, respectively. In multiple linear regression analysis, maternal occupational exposure, maternal consumption of homemade dehydrated vegetables and maternal habitation period in Kunming city were significantly associated with an increase of umbilical cord blood lead concentration. In addition, logistic regression analysis was used to assess the association of umbilical cord blood lead concentrations that possibly have adverse effects on brain development of newborns with each potential risk factor. Maternal frequent use of tableware with color patterns inside was significantly associated with higher cord blood lead concentration in addition to the three items in the multiple linear regression analysis. These points should be considered as specific recommendations for maternal and fetal lead exposure in this city.
do Prado, Mara Rúbia Maciel Cardoso; Oliveira, Fabiana de Cássia Carvalho; Assis, Karine Franklin; Ribeiro, Sarah Aparecida Vieira; do Prado Junior, Pedro Paulo; Sant'Ana, Luciana Ferreira da Rocha; Priore, Silvia Eloiza; Franceschini, Sylvia do Carmo Castro
2015-01-01
To assess the prevalence of vitamin D deficiency and its associated factors in women and their newborns in the postpartum period. This cross-sectional study evaluated vitamin D deficiency/insufficiency in 226 women and their newborns in Viçosa (Minas Gerais, BR) between December 2011 and November 2012. Cord blood and venous maternal blood were collected to evaluate the following biochemical parameters: vitamin D, alkaline phosphatase, calcium, phosphorus and parathyroid hormone. Poisson regression analysis, with a confidence interval of 95% was applied to assess vitamin D deficiency and its associated factors. Multiple linear regression analysis was performed to identify factors associated with 25(OH)D deficiency in the newborns and women from the study. The criteria for variable inclusion in the multiple linear regression model was the association with the dependent variable in the simple linear regression analysis, considering p<0.20. Significance level was α<5%. From 226 women included, 200 (88.5%) were 20 to 44 years old; the median age was 28 years. Deficient/insufficient levels of vitamin D were found in 192 (85%) women and in 182 (80.5%) neonates. The maternal 25(OH)D and alkaline phosphatase levels were independently associated with vitamin D deficiency in infants. This study identified a high prevalence of vitamin D deficiency and insufficiency in women and newborns and the association between maternal nutritional status of vitamin D and their infants' vitamin D status. Copyright © 2015 Sociedade de Pediatria de São Paulo. Publicado por Elsevier Editora Ltda. All rights reserved.
Kumar, K Vasanth; Porkodi, K; Rocha, F
2008-01-15
A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of basic red 9 sorption by activated carbon. The r(2) was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions namely coefficient of determination (r(2)), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), the average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. Non-linear regression was found to be a better way to obtain the parameters involved in the isotherms and also the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r(2) was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K(2) was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm.
Multiple regression for physiological data analysis: the problem of multicollinearity.
Slinker, B K; Glantz, S A
1985-07-01
Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
The Relationship Between Surface Curvature and Abdominal Aortic Aneurysm Wall Stress.
de Galarreta, Sergio Ruiz; Cazón, Aitor; Antón, Raúl; Finol, Ender A
2017-08-01
The maximum diameter (MD) criterion is the most important factor when predicting risk of rupture of abdominal aortic aneurysms (AAAs). An elevated wall stress has also been linked to a high risk of aneurysm rupture, yet is an uncommon clinical practice to compute AAA wall stress. The purpose of this study is to assess whether other characteristics of the AAA geometry are statistically correlated with wall stress. Using in-house segmentation and meshing algorithms, 30 patient-specific AAA models were generated for finite element analysis (FEA). These models were subsequently used to estimate wall stress and maximum diameter and to evaluate the spatial distributions of wall thickness, cross-sectional diameter, mean curvature, and Gaussian curvature. Data analysis consisted of statistical correlations of the aforementioned geometry metrics with wall stress for the 30 AAA inner and outer wall surfaces. In addition, a linear regression analysis was performed with all the AAA wall surfaces to quantify the relationship of the geometric indices with wall stress. These analyses indicated that while all the geometry metrics have statistically significant correlations with wall stress, the local mean curvature (LMC) exhibits the highest average Pearson's correlation coefficient for both inner and outer wall surfaces. The linear regression analysis revealed coefficients of determination for the outer and inner wall surfaces of 0.712 and 0.516, respectively, with LMC having the largest effect on the linear regression equation with wall stress. This work underscores the importance of evaluating AAA mean wall curvature as a potential surrogate for wall stress.
Agarwal, Parul; Sambamoorthi, Usha
2015-12-01
Depression is common among individuals with osteoarthritis and leads to increased healthcare burden. The objective of this study was to examine excess total healthcare expenditures associated with depression among individuals with osteoarthritis in the US. Adults with self-reported osteoarthritis (n = 1881) were identified using data from the 2010 Medical Expenditure Panel Survey (MEPS). Among those with osteoarthritis, chi-square tests and ordinary least square regressions (OLS) were used to examine differences in healthcare expenditures between those with and without depression. Post-regression linear decomposition technique was used to estimate the relative contribution of different constructs of the Anderson's behavioral model, i.e., predisposing, enabling, need, personal healthcare practices, and external environment factors, to the excess expenditures associated with depression among individuals with osteoarthritis. All analysis accounted for the complex survey design of MEPS. Depression coexisted among 20.6 % of adults with osteoarthritis. The average total healthcare expenditures were $13,684 among adults with depression compared to $9284 among those without depression. Multivariable OLS regression revealed that adults with depression had 38.8 % higher healthcare expenditures (p < 0.001) compared to those without depression. Post-regression linear decomposition analysis indicated that 50 % of differences in expenditures among adults with and without depression can be explained by differences in need factors. Among individuals with coexisting osteoarthritis and depression, excess healthcare expenditures associated with depression were mainly due to comorbid anxiety, chronic conditions and poor health status. These expenditures may potentially be reduced by providing timely intervention for need factors or by providing care under a collaborative care model.
Nonlinear multivariate and time series analysis by neural network methods
NASA Astrophysics Data System (ADS)
Hsieh, William W.
2004-03-01
Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.
ERIC Educational Resources Information Center
Wiley, Kristofor R.
2013-01-01
Many of the social and emotional needs that have historically been associated with gifted students have been questioned on the basis of recent empirical evidence. Research on the topic, however, is often limited by sample size, selection bias, or definition. This study addressed these limitations by applying linear regression methodology to data…
ERIC Educational Resources Information Center
Fan, Xitao; Wang, Lin
The Monte Carlo study compared the performance of predictive discriminant analysis (PDA) and that of logistic regression (LR) for the two-group classification problem. Prior probabilities were used for classification, but the cost of misclassification was assumed to be equal. The study used a fully crossed three-factor experimental design (with…
Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Equilibrium, kinetics and process design of acid yellow 132 adsorption onto red pine sawdust.
Can, Mustafa
2015-01-01
Linear and non-linear regression procedures have been applied to the Langmuir, Freundlich, Tempkin, Dubinin-Radushkevich, and Redlich-Peterson isotherms for adsorption of acid yellow 132 (AY132) dye onto red pine (Pinus resinosa) sawdust. The effects of parameters such as particle size, stirring rate, contact time, dye concentration, adsorption dose, pH, and temperature were investigated, and interaction was characterized by Fourier transform infrared spectroscopy and field emission scanning electron microscope. The non-linear method of the Langmuir isotherm equation was found to be the best fitting model to the equilibrium data. The maximum monolayer adsorption capacity was found as 79.5 mg/g. The calculated thermodynamic results suggested that AY132 adsorption onto red pine sawdust was an exothermic, physisorption, and spontaneous process. Kinetics was analyzed by four different kinetic equations using non-linear regression analysis. The pseudo-second-order equation provides the best fit with experimental data.
An Analysis of San Diego's Housing Market Using a Geographically Weighted Regression Approach
NASA Astrophysics Data System (ADS)
Grant, Christina P.
San Diego County real estate transaction data was evaluated with a set of linear models calibrated by ordinary least squares and geographically weighted regression (GWR). The goal of the analysis was to determine whether the spatial effects assumed to be in the data are best studied globally with no spatial terms, globally with a fixed effects submarket variable, or locally with GWR. 18,050 single-family residential sales which closed in the six months between April 2014 and September 2014 were used in the analysis. Diagnostic statistics including AICc, R2, Global Moran's I, and visual inspection of diagnostic plots and maps indicate superior model performance by GWR as compared to both global regressions.
Linear regression crash prediction models : issues and proposed solutions.
DOT National Transportation Integrated Search
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data
Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.
2013-01-01
Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
ERIC Educational Resources Information Center
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
Combined analysis of magnetic and gravity anomalies using normalized source strength (NSS)
NASA Astrophysics Data System (ADS)
Li, L.; Wu, Y.
2017-12-01
Gravity field and magnetic field belong to potential fields which lead inherent multi-solution. Combined analysis of magnetic and gravity anomalies based on Poisson's relation is used to determinate homology gravity and magnetic anomalies and decrease the ambiguity. The traditional combined analysis uses the linear regression of the reduction to pole (RTP) magnetic anomaly to the first order vertical derivative of the gravity anomaly, and provides the quantitative or semi-quantitative interpretation by calculating the correlation coefficient, slope and intercept. In the calculation process, due to the effect of remanent magnetization, the RTP anomaly still contains the effect of oblique magnetization. In this case the homology gravity and magnetic anomalies display irrelevant results in the linear regression calculation. The normalized source strength (NSS) can be transformed from the magnetic tensor matrix, which is insensitive to the remanence. Here we present a new combined analysis using NSS. Based on the Poisson's relation, the gravity tensor matrix can be transformed into the pseudomagnetic tensor matrix of the direction of geomagnetic field magnetization under the homologous condition. The NSS of pseudomagnetic tensor matrix and original magnetic tensor matrix are calculated and linear regression analysis is carried out. The calculated correlation coefficient, slope and intercept indicate the homology level, Poisson's ratio and the distribution of remanent respectively. We test the approach using synthetic model under complex magnetization, the results show that it can still distinguish the same source under the condition of strong remanence, and establish the Poisson's ratio. Finally, this approach is applied in China. The results demonstrated that our approach is feasible.
Brown, Angus M
2006-04-01
The objective of this present study was to demonstrate a method for fitting complex electrophysiological data with multiple functions using the SOLVER add-in of the ubiquitous spreadsheet Microsoft Excel. SOLVER minimizes the difference between the sum of the squares of the data to be fit and the function(s) describing the data using an iterative generalized reduced gradient method. While it is a straightforward procedure to fit data with linear functions, and we have previously demonstrated a method of non-linear regression analysis of experimental data based upon a single function, it is more complex to fit data with multiple functions, usually requiring specialized expensive computer software. In this paper we describe an easily understood program for fitting experimentally acquired data, in this case the stimulus-evoked compound action potential from the mouse optic nerve, with multiple Gaussian functions. The program is flexible and can be applied to describe data with a wide variety of user-input functions.
Computerized dynamic posturography: the influence of platform stability on postural control.
Palm, Hans-Georg; Lang, Patricia; Strobel, Johannes; Riesner, Hans-Joachim; Friemert, Benedikt
2014-01-01
Postural stability can be quantified using posturography systems, which allow different foot platform stability settings to be selected. It is unclear, however, how platform stability and postural control are mathematically correlated. Twenty subjects performed tests on the Biodex Stability System at all 13 stability levels. Overall stability index, medial-lateral stability index, and anterior-posterior stability index scores were calculated, and data were analyzed using analysis of variance and linear regression analysis. A decrease in platform stability from the static level to the second least stable level was associated with a linear decrease in postural control. The overall stability index scores were 1.5 ± 0.8 degrees (static), 2.2 ± 0.9 degrees (level 8), and 3.6 ± 1.7 degrees (level 2). The slope of the regression lines was 0.17 for the men and 0.10 for the women. A linear correlation was demonstrated between platform stability and postural control. The influence of stability levels seems to be almost twice as high in men as in women.
NASA Technical Reports Server (NTRS)
Mcgwire, K.; Friedl, M.; Estes, J. E.
1993-01-01
This article describes research related to sampling techniques for establishing linear relations between land surface parameters and remotely-sensed data. Predictive relations are estimated between percentage tree cover in a savanna environment and a normalized difference vegetation index (NDVI) derived from the Thematic Mapper sensor. Spatial autocorrelation in original measurements and regression residuals is examined using semi-variogram analysis at several spatial resolutions. Sampling schemes are then tested to examine the effects of autocorrelation on predictive linear models in cases of small sample sizes. Regression models between image and ground data are affected by the spatial resolution of analysis. Reducing the influence of spatial autocorrelation by enforcing minimum distances between samples may also improve empirical models which relate ground parameters to satellite data.
NASA Astrophysics Data System (ADS)
Singh, S.; Jaishi, H. P.; Tiwari, R. P.; Tiwari, R. C.
2017-07-01
This paper reports the analysis of soil radon data recorded in the seismic zone-V, located in the northeastern part of India (latitude 23.73N, longitude 92.73E). Continuous measurements of soil-gas emission along Chite fault in Mizoram (India) were carried out with the replacement of solid-state nuclear track detectors at weekly interval. The present study was done for the period from March 2013 to May 2015 using LR-115 Type II detectors, manufactured by Kodak Pathe, France. In order to reduce the influence of meteorological parameters, statistical analysis tools such as multiple linear regression and artificial neural network have been used. Decrease in radon concentration was recorded prior to some earthquakes that occurred during the observation period. Some false anomalies were also recorded which may be attributed to the ongoing crustal deformation which was not major enough to produce an earthquake.
Flora, David B.; LaBrish, Cathy; Chalmers, R. Philip
2011-01-01
We provide a basic review of the data screening and assumption testing issues relevant to exploratory and confirmatory factor analysis along with practical advice for conducting analyses that are sensitive to these concerns. Historically, factor analysis was developed for explaining the relationships among many continuous test scores, which led to the expression of the common factor model as a multivariate linear regression model with observed, continuous variables serving as dependent variables, and unobserved factors as the independent, explanatory variables. Thus, we begin our paper with a review of the assumptions for the common factor model and data screening issues as they pertain to the factor analysis of continuous observed variables. In particular, we describe how principles from regression diagnostics also apply to factor analysis. Next, because modern applications of factor analysis frequently involve the analysis of the individual items from a single test or questionnaire, an important focus of this paper is the factor analysis of items. Although the traditional linear factor model is well-suited to the analysis of continuously distributed variables, commonly used item types, including Likert-type items, almost always produce dichotomous or ordered categorical variables. We describe how relationships among such items are often not well described by product-moment correlations, which has clear ramifications for the traditional linear factor analysis. An alternative, non-linear factor analysis using polychoric correlations has become more readily available to applied researchers and thus more popular. Consequently, we also review the assumptions and data-screening issues involved in this method. Throughout the paper, we demonstrate these procedures using an historic data set of nine cognitive ability variables. PMID:22403561
Quantile regression for the statistical analysis of immunological data with many non-detects.
Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth
2012-07-07
Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.
Athanasopoulos, Leonidas V; Dritsas, Athanasios; Doll, Helen A; Cokkinos, Dennis V
2010-08-01
This study was conducted to explain the variance in quality of life (QoL) and activity capacity of patients with congestive heart failure from pathophysiological changes as estimated by laboratory data. Peak oxygen consumption (peak VO2) and ventilation (VE)/carbon dioxide output (VCO2) slope derived from cardiopulmonary exercise testing, plasma N-terminal prohormone of B-type natriuretic peptide (NT-proBNP), and echocardiographic markers [left atrium (LA), left ventricular ejection fraction (LVEF)] were measured in 62 patients with congestive heart failure, who also completed the Minnesota Living with Heart Failure Questionnaire and the Specific Activity Questionnaire. All regression models were adjusted for age and sex. On linear regression analysis, peak VO2 with P value less than 0.001, VE/VCO2 slope with P value less than 0.01, LVEF with P value less than 0.001, LA with P=0.001, and logNT-proBNP with P value less than 0.01 were found to be associated with QoL. On stepwise multiple linear regression, peak VO2 and LVEF continued to be predictive, accounting for 40% of the variability in Minnesota Living with Heart Failure Questionnaire score. On linear regression analysis, peak VO2 with P value less than 0.001, VE/VCO2 slope with P value less than 0.001, LVEF with P value less than 0.05, LA with P value less than 0.001, and logNT-proBNP with P value less than 0.001 were found to be associated with activity capacity. On stepwise multiple linear regression, peak VO2 and LA continued to be predictive, accounting for 53% of the variability in Specific Activity Questionnaire score. Peak VO2 is independently associated both with QoL and activity capacity. In addition to peak VO2, LVEF is independently associated with QoL, and LA with activity capacity.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Pease, J M; Morselli, M F
1987-01-01
This paper deals with a computer program adapted to a statistical method for analyzing an unlimited quantity of binary recorded data of an independent circular variable (e.g. wind direction), and a linear variable (e.g. maple sap flow volume). Circular variables cannot be statistically analyzed with linear methods, unless they have been transformed. The program calculates a critical quantity, the acrophase angle (PHI, phi o). The technique is adapted from original mathematics [1] and is written in Fortran 77 for easier conversion between computer networks. Correlation analysis can be performed following the program or regression which, because of the circular nature of the independent variable, becomes periodic regression. The technique was tested on a file of approximately 4050 data pairs.
Analysis of the Effects of the Commander’s Battle Positioning on Unit Combat Performance
1991-03-01
Analysis ......... .. 58 Logistic Regression Analysis ......... .. 61 Canonical Correlation Analysis ........ .. 62 Descriminant Analysis...entails classifying objects into two or more distinct groups, or responses. Dillon defines descriminant analysis as "deriving linear combinations of the...object given it’s predictor variables. The second objective is, through analysis of the parameters of the descriminant functions, determine those
NASA Astrophysics Data System (ADS)
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
The Influential Effect of Blending, Bump, Changing Period, and Eclipsing Cepheids on the Leavitt Law
NASA Astrophysics Data System (ADS)
García-Varela, A.; Muñoz, J. R.; Sabogal, B. E.; Vargas Domínguez, S.; Martínez, J.
2016-06-01
The investigation of the nonlinearity of the Leavitt law (LL) is a topic that began more than seven decades ago, when some of the studies in this field found that the LL has a break at about 10 days. The goal of this work is to investigate a possible statistical cause of this nonlinearity. By applying linear regressions to OGLE-II and OGLE-IV data, we find that to obtain the LL by using linear regression, robust techniques to deal with influential points and/or outliers are needed instead of the ordinary least-squares regression traditionally used. In particular, by using M- and MM-regressions we establish firmly and without doubt the linearity of the LL in the Large Magellanic Cloud, without rejecting or excluding Cepheid data from the analysis. This implies that light curves of Cepheids suggesting blending, bumps, eclipses, or period changes do not affect the LL for this galaxy. For the Small Magellanic Cloud, when including Cepheids of this kind, it is not possible to find an adequate model, probably because of the geometry of the galaxy. In that case, a possible influence of these stars could exist.
Multiple regression technique for Pth degree polynominals with and without linear cross products
NASA Technical Reports Server (NTRS)
Davis, J. W.
1973-01-01
A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha
2012-05-01
Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Jose F. Negron
1998-01-01
Infested and uninfested areas within Douglas fir, Pseudotsuga menziesii Mirb.. Franco, stands affected by the Douglas-fir beetle, Dendroctonus pseudotsugae Hopk. were sampled in the Colorado Front Range, CO. Classification tree models were built to predict probabilities of infestation. Regression trees and linear regression analysis were used to model amount of tree...
Artes, Paul H; Crabb, David P
2010-01-01
To investigate why the specificity of the Moorfields Regression Analysis (MRA) of the Heidelberg Retina Tomograph (HRT) varies with disc size, and to derive accurate normative limits for neuroretinal rim area to address this problem. Two datasets from healthy subjects (Manchester, UK, n = 88; Halifax, Nova Scotia, Canada, n = 75) were used to investigate the physiological relationship between the optic disc and neuroretinal rim area. Normative limits for rim area were derived by quantile regression (QR) and compared with those of the MRA (derived by linear regression). Logistic regression analyses were performed to quantify the association between disc size and positive classifications with the MRA, as well as with the QR-derived normative limits. In both datasets, the specificity of the MRA depended on optic disc size. The odds of observing a borderline or outside-normal-limits classification increased by approximately 10% for each 0.1 mm(2) increase in disc area (P < 0.1). The lower specificity of the MRA with large optic discs could be explained by the failure of linear regression to model the extremes of the rim area distribution (observations far from the mean). In comparison, the normative limits predicted by QR were larger for smaller discs (less specific, more sensitive), and smaller for larger discs, such that false-positive rates became independent of optic disc size. Normative limits derived by quantile regression appear to remove the size-dependence of specificity with the MRA. Because quantile regression does not rely on the restrictive assumptions of standard linear regression, it may be a more appropriate method for establishing normative limits in other clinical applications where the underlying distributions are nonnormal or have nonconstant variance.
2013-01-01
Background This study aims to improve accuracy of Bioelectrical Impedance Analysis (BIA) prediction equations for estimating fat free mass (FFM) of the elderly by using non-linear Back Propagation Artificial Neural Network (BP-ANN) model and to compare the predictive accuracy with the linear regression model by using energy dual X-ray absorptiometry (DXA) as reference method. Methods A total of 88 Taiwanese elderly adults were recruited in this study as subjects. Linear regression equations and BP-ANN prediction equation were developed using impedances and other anthropometrics for predicting the reference FFM measured by DXA (FFMDXA) in 36 male and 26 female Taiwanese elderly adults. The FFM estimated by BIA prediction equations using traditional linear regression model (FFMLR) and BP-ANN model (FFMANN) were compared to the FFMDXA. The measuring results of an additional 26 elderly adults were used to validate than accuracy of the predictive models. Results The results showed the significant predictors were impedance, gender, age, height and weight in developed FFMLR linear model (LR) for predicting FFM (coefficient of determination, r2 = 0.940; standard error of estimate (SEE) = 2.729 kg; root mean square error (RMSE) = 2.571kg, P < 0.001). The above predictors were set as the variables of the input layer by using five neurons in the BP-ANN model (r2 = 0.987 with a SD = 1.192 kg and relatively lower RMSE = 1.183 kg), which had greater (improved) accuracy for estimating FFM when compared with linear model. The results showed a better agreement existed between FFMANN and FFMDXA than that between FFMLR and FFMDXA. Conclusion When compared the performance of developed prediction equations for estimating reference FFMDXA, the linear model has lower r2 with a larger SD in predictive results than that of BP-ANN model, which indicated ANN model is more suitable for estimating FFM. PMID:23388042
Practical Session: Multiple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr33.pdf) and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_5.pdf).
Face Hallucination with Linear Regression Model in Semi-Orthogonal Multilinear PCA Method
NASA Astrophysics Data System (ADS)
Asavaskulkiet, Krissada
2018-04-01
In this paper, we propose a new face hallucination technique, face images reconstruction in HSV color space with a semi-orthogonal multilinear principal component analysis method. This novel hallucination technique can perform directly from tensors via tensor-to-vector projection by imposing the orthogonality constraint in only one mode. In our experiments, we use facial images from FERET database to test our hallucination approach which is demonstrated by extensive experiments with high-quality hallucinated color faces. The experimental results assure clearly demonstrated that we can generate photorealistic color face images by using the SO-MPCA subspace with a linear regression model.
Adachi, Daiki; Nishiguchi, Shu; Fukutani, Naoto; Hotta, Takayuki; Tashiro, Yuto; Morino, Saori; Shirooka, Hidehiko; Nozaki, Yuma; Hirata, Hinako; Yamaguchi, Moe; Yorozu, Ayanori; Takahashi, Masaki; Aoyama, Tomoki
2017-05-01
The purpose of this study was to investigate which spatial and temporal parameters of the Timed Up and Go (TUG) test are associated with motor function in elderly individuals. This study included 99 community-dwelling women aged 72.9 ± 6.3 years. Step length, step width, single support time, variability of the aforementioned parameters, gait velocity, cadence, reaction time from starting signal to first step, and minimum distance between the foot and a marker placed to 3 in front of the chair were measured using our analysis system. The 10-m walk test, five times sit-to-stand (FTSTS) test, and one-leg standing (OLS) test were used to assess motor function. Stepwise multivariate linear regression analysis was used to determine which TUG test parameters were associated with each motor function test. Finally, we calculated a predictive model for each motor function test using each regression coefficient. In stepwise linear regression analysis, step length and cadence were significantly associated with the 10-m walk test, FTSTS and OLS test. Reaction time was associated with the FTSTS test, and step width was associated with the OLS test. Each predictive model showed a strong correlation with the 10-m walk test and OLS test (P < 0.01), which was not significant higher correlation than TUG test time. We showed which TUG test parameters were associated with each motor function test. Moreover, the TUG test time regarded as the lower extremity function and mobility has strong predictive ability in each motor function test. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Regression Is a Univariate General Linear Model Subsuming Other Parametric Methods as Special Cases.
ERIC Educational Resources Information Center
Vidal, Sherry
Although the concept of the general linear model (GLM) has existed since the 1960s, other univariate analyses such as the t-test and the analysis of variance models have remained popular. The GLM produces an equation that minimizes the mean differences of independent variables as they are related to a dependent variable. From a computer printout…
1990-09-01
without the help from the DSXR staff. William Lyons, Charles Ramsey , and Martin Meeks went above and beyond to help complete this research. Special...develop a valid forecasting model that is significantly more accurate than the one presently used by DSXR and suggested the development and testing of a...method, Strom tested DSXR’s iterative linear regression forecasting technique by examining P1 in the simple regression equation to determine whether
Kumar, K Vasanth
2006-10-11
Batch kinetic experiments were carried out for the sorption of methylene blue onto activated carbon. The experimental kinetics were fitted to the pseudo first-order and pseudo second-order kinetics by linear and a non-linear method. The five different types of Ho pseudo second-order expression have been discussed. A comparison of linear least-squares method and a trial and error non-linear method of estimating the pseudo second-order rate kinetic parameters were examined. The sorption process was found to follow a both pseudo first-order kinetic and pseudo second-order kinetic model. Present investigation showed that it is inappropriate to use a type 1 and type pseudo second-order expressions as proposed by Ho and Blanachard et al. respectively for predicting the kinetic rate constants and the initial sorption rate for the studied system. Three correct possible alternate linear expressions (type 2 to type 4) to better predict the initial sorption rate and kinetic rate constants for the studied system (methylene blue/activated carbon) was proposed. Linear method was found to check only the hypothesis instead of verifying the kinetic model. Non-linear regression method was found to be the more appropriate method to determine the rate kinetic parameters.
Cook, James P; Mahajan, Anubha; Morris, Andrew P
2017-02-01
Linear mixed models are increasingly used for the analysis of genome-wide association studies (GWAS) of binary phenotypes because they can efficiently and robustly account for population stratification and relatedness through inclusion of random effects for a genetic relationship matrix. However, the utility of linear (mixed) models in the context of meta-analysis of GWAS of binary phenotypes has not been previously explored. In this investigation, we present simulations to compare the performance of linear and logistic regression models under alternative weighting schemes in a fixed-effects meta-analysis framework, considering designs that incorporate variable case-control imbalance, confounding factors and population stratification. Our results demonstrate that linear models can be used for meta-analysis of GWAS of binary phenotypes, without loss of power, even in the presence of extreme case-control imbalance, provided that one of the following schemes is used: (i) effective sample size weighting of Z-scores or (ii) inverse-variance weighting of allelic effect sizes after conversion onto the log-odds scale. Our conclusions thus provide essential recommendations for the development of robust protocols for meta-analysis of binary phenotypes with linear models.
Least Principal Components Analysis (LPCA): An Alternative to Regression Analysis.
ERIC Educational Resources Information Center
Olson, Jeffery E.
Often, all of the variables in a model are latent, random, or subject to measurement error, or there is not an obvious dependent variable. When any of these conditions exist, an appropriate method for estimating the linear relationships among the variables is Least Principal Components Analysis. Least Principal Components are robust, consistent,…
Abu Bakar, S N; Aspalilah, A; AbdelNasser, I; Nurliza, A; Hairuliza, M J; Swarhib, M; Das, S; Mohd Nor, F
2017-01-01
Stature is one of the characteristics that could be used to identify human, besides age, sex and racial affiliation. This is useful when the body found is either dismembered, mutilated or even decomposed, and helps in narrowing down the missing person's identity. The main aim of the present study was to construct regression functions for stature estimation by using lower limb bones in the Malaysian population. The sample comprised 87 adult individuals (81 males, 6 females) aged between 20 to 79 years. The parameters such as thigh length, lower leg length, leg length, foot length, foot height and foot breadth were measured. They were measured by a ruler and measuring tape. Statistical analysis involved independent t-test to analyse the difference between lower limbs in male and female. The Pearson's correlation test was used to analyse correlations between lower limb parameters and stature, and the linear regressions were used to form equations. The paired t-test was used to compare between actual stature and estimated stature by using the equations formed. Using independent t-test, there was a significant difference (p< 0.05) in the measurement between males and females with regard to leg length, thigh length, lower leg length, foot length and foot breadth. The thigh length, leg length and foot length were observed to have strong correlations with stature with p= 0.75, p= 0.81 and p= 0.69, respectively. Linear regressions were formulated for stature estimation. Paired t-test showed no significant difference between actual stature and estimated stature. It is concluded that regression functions can be used to estimate stature to identify skeletal remains in the Malaysia population.
Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M
In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.
1974-01-01
REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
Lamm, Steven H; Ferdosi, Hamid; Dissen, Elisabeth K; Li, Ji; Ahn, Jaeil
2015-12-07
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1-1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100-150 µg/L arsenic.
Lamm, Steven H.; Ferdosi, Hamid; Dissen, Elisabeth K.; Li, Ji; Ahn, Jaeil
2015-01-01
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1–1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100–150 µg/L arsenic. PMID:26690190
Independent contrasts and PGLS regression estimators are equivalent.
Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary
2012-05-01
We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.
Element enrichment factor calculation using grain-size distribution and functional data regression.
Sierra, C; Ordóñez, C; Saavedra, A; Gallego, J R
2015-01-01
In environmental geochemistry studies it is common practice to normalize element concentrations in order to remove the effect of grain size. Linear regression with respect to a particular grain size or conservative element is a widely used method of normalization. In this paper, the utility of functional linear regression, in which the grain-size curve is the independent variable and the concentration of pollutant the dependent variable, is analyzed and applied to detrital sediment. After implementing functional linear regression and classical linear regression models to normalize and calculate enrichment factors, we concluded that the former regression technique has some advantages over the latter. First, functional linear regression directly considers the grain-size distribution of the samples as the explanatory variable. Second, as the regression coefficients are not constant values but functions depending on the grain size, it is easier to comprehend the relationship between grain size and pollutant concentration. Third, regularization can be introduced into the model in order to establish equilibrium between reliability of the data and smoothness of the solutions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Thieler, E. Robert; Himmelstoss, Emily A.; Zichichi, Jessica L.; Ergul, Ayhan
2009-01-01
The Digital Shoreline Analysis System (DSAS) version 4.0 is a software extension to ESRI ArcGIS v.9.2 and above that enables a user to calculate shoreline rate-of-change statistics from multiple historic shoreline positions. A user-friendly interface of simple buttons and menus guides the user through the major steps of shoreline change analysis. Components of the extension and user guide include (1) instruction on the proper way to define a reference baseline for measurements, (2) automated and manual generation of measurement transects and metadata based on user-specified parameters, and (3) output of calculated rates of shoreline change and other statistical information. DSAS computes shoreline rates of change using four different methods: (1) endpoint rate, (2) simple linear regression, (3) weighted linear regression, and (4) least median of squares. The standard error, correlation coefficient, and confidence interval are also computed for the simple and weighted linear-regression methods. The results of all rate calculations are output to a table that can be linked to the transect file by a common attribute field. DSAS is intended to facilitate the shoreline change-calculation process and to provide rate-of-change information and the statistical data necessary to establish the reliability of the calculated results. The software is also suitable for any generic application that calculates positional change over time, such as assessing rates of change of glacier limits in sequential aerial photos, river edge boundaries, land-cover changes, and so on.
NASA Astrophysics Data System (ADS)
Tan, C. H.; Matjafri, M. Z.; Lim, H. S.
2015-10-01
This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
Who Will Win?: Predicting the Presidential Election Using Linear Regression
ERIC Educational Resources Information Center
Lamb, John H.
2007-01-01
This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States
NASA Astrophysics Data System (ADS)
Yang, J.; Astitha, M.; Schwartz, C. S.
2017-12-01
Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.
Forecasting daily patient volumes in the emergency department.
Jones, Spencer S; Thomas, Alun; Evans, R Scott; Welch, Shari J; Haug, Peter J; Snow, Gregory L
2008-02-01
Shifts in the supply of and demand for emergency department (ED) resources make the efficient allocation of ED resources increasingly important. Forecasting is a vital activity that guides decision-making in many areas of economic, industrial, and scientific planning, but has gained little traction in the health care industry. There are few studies that explore the use of forecasting methods to predict patient volumes in the ED. The goals of this study are to explore and evaluate the use of several statistical forecasting methods to predict daily ED patient volumes at three diverse hospital EDs and to compare the accuracy of these methods to the accuracy of a previously proposed forecasting method. Daily patient arrivals at three hospital EDs were collected for the period January 1, 2005, through March 31, 2007. The authors evaluated the use of seasonal autoregressive integrated moving average, time series regression, exponential smoothing, and artificial neural network models to forecast daily patient volumes at each facility. Forecasts were made for horizons ranging from 1 to 30 days in advance. The forecast accuracy achieved by the various forecasting methods was compared to the forecast accuracy achieved when using a benchmark forecasting method already available in the emergency medicine literature. All time series methods considered in this analysis provided improved in-sample model goodness of fit. However, post-sample analysis revealed that time series regression models that augment linear regression models by accounting for serial autocorrelation offered only small improvements in terms of post-sample forecast accuracy, relative to multiple linear regression models, while seasonal autoregressive integrated moving average, exponential smoothing, and artificial neural network forecasting models did not provide consistently accurate forecasts of daily ED volumes. This study confirms the widely held belief that daily demand for ED services is characterized by seasonal and weekly patterns. The authors compared several time series forecasting methods to a benchmark multiple linear regression model. The results suggest that the existing methodology proposed in the literature, multiple linear regression based on calendar variables, is a reasonable approach to forecasting daily patient volumes in the ED. However, the authors conclude that regression-based models that incorporate calendar variables, account for site-specific special-day effects, and allow for residual autocorrelation provide a more appropriate, informative, and consistently accurate approach to forecasting daily ED patient volumes.
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Advanced Statistics for Exotic Animal Practitioners.
Hodsoll, John; Hellier, Jennifer M; Ryan, Elizabeth G
2017-09-01
Correlation and regression assess the association between 2 or more variables. This article reviews the core knowledge needed to understand these analyses, moving from visual analysis in scatter plots through correlation, simple and multiple linear regression, and logistic regression. Correlation estimates the strength and direction of a relationship between 2 variables. Regression can be considered more general and quantifies the numerical relationships between an outcome and 1 or multiple variables in terms of a best-fit line, allowing predictions to be made. Each technique is discussed with examples and the statistical assumptions underlying their correct application. Copyright © 2017 Elsevier Inc. All rights reserved.
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Robust neural network with applications to credit portfolio data analysis.
Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun
2010-01-01
In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.
Estimating linear effects in ANOVA designs: the easy way.
Pinhas, Michal; Tzelgov, Joseph; Ganor-Stern, Dana
2012-09-01
Research in cognitive science has documented numerous phenomena that are approximated by linear relationships. In the domain of numerical cognition, the use of linear regression for estimating linear effects (e.g., distance and SNARC effects) became common following Fias, Brysbaert, Geypens, and d'Ydewalle's (1996) study on the SNARC effect. While their work has become the model for analyzing linear effects in the field, it requires statistical analysis of individual participants and does not provide measures of the proportions of variability accounted for (cf. Lorch & Myers, 1990). In the present methodological note, using both the distance and SNARC effects as examples, we demonstrate how linear effects can be estimated in a simple way within the framework of repeated measures analysis of variance. This method allows for estimating effect sizes in terms of both slope and proportions of variability accounted for. Finally, we show that our method can easily be extended to estimate linear interaction effects, not just linear effects calculated as main effects.
ERIC Educational Resources Information Center
Thum, Yeow Meng; Bhattacharya, Suman Kumar
To better describe individual behavior within a system, this paper uses a sample of longitudinal test scores from a large urban school system to consider hierarchical Bayes estimation of a multilevel linear regression model in which each individual regression slope of test score on time switches at some unknown point in time, "kj."…
ERIC Educational Resources Information Center
Gelman, Andrew; Imbens, Guido
2014-01-01
It is common in regression discontinuity analysis to control for high order (third, fourth, or higher) polynomials of the forcing variable. We argue that estimators for causal effects based on such methods can be misleading, and we recommend researchers do not use them, and instead use estimators based on local linear or quadratic polynomials or…
Shillcutt, Samuel D; LeFevre, Amnesty E; Fischer-Walker, Christa L; Taneja, Sunita; Black, Robert E; Mazumder, Sarmila
2017-01-01
This study evaluates the cost-effectiveness of the DAZT program for scaling up treatment of acute child diarrhea in Gujarat India using a net-benefit regression framework. Costs were calculated from societal and caregivers' perspectives and effectiveness was assessed in terms of coverage of zinc and both zinc and Oral Rehydration Salt. Regression models were tested in simple linear regression, with a specified set of covariates, and with a specified set of covariates and interaction terms using linear regression with endogenous treatment effects was used as the reference case. The DAZT program was cost-effective with over 95% certainty above $5.50 and $7.50 per appropriately treated child in the unadjusted and adjusted models respectively, with specifications including interaction terms being cost-effective with 85-97% certainty. Findings from this study should be combined with other evidence when considering decisions to scale up programs such as the DAZT program to promote the use of ORS and zinc to treat child diarrhea.
Ladstätter, Felix; Garrosa, Eva; Moreno-Jiménez, Bernardo; Ponsoda, Vicente; Reales Aviles, José Manuel; Dai, Junming
2016-01-01
Artificial neural networks are sophisticated modelling and prediction tools capable of extracting complex, non-linear relationships between predictor (input) and predicted (output) variables. This study explores this capacity by modelling non-linearities in the hardiness-modulated burnout process with a neural network. Specifically, two multi-layer feed-forward artificial neural networks are concatenated in an attempt to model the composite non-linear burnout process. Sensitivity analysis, a Monte Carlo-based global simulation technique, is then utilised to examine the first-order effects of the predictor variables on the burnout sub-dimensions and consequences. Results show that (1) this concatenated artificial neural network approach is feasible to model the burnout process, (2) sensitivity analysis is a prolific method to study the relative importance of predictor variables and (3) the relationships among variables involved in the development of burnout and its consequences are to different degrees non-linear. Many relationships among variables (e.g., stressors and strains) are not linear, yet researchers use linear methods such as Pearson correlation or linear regression to analyse these relationships. Artificial neural network analysis is an innovative method to analyse non-linear relationships and in combination with sensitivity analysis superior to linear methods.
Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions.
Drouard, Vincent; Horaud, Radu; Deleforge, Antoine; Ba, Sileye; Evangelidis, Georgios
2017-03-01
Head-pose estimation has many applications, such as social event analysis, human-robot and human-computer interaction, driving assistance, and so forth. Head-pose estimation is challenging, because it must cope with changing illumination conditions, variabilities in face orientation and in appearance, partial occlusions of facial landmarks, as well as bounding-box-to-face alignment errors. We propose to use a mixture of linear regressions with partially-latent output. This regression method learns to map high-dimensional feature vectors (extracted from bounding boxes of faces) onto the joint space of head-pose angles and bounding-box shifts, such that they are robustly predicted in the presence of unobservable phenomena. We describe in detail the mapping method that combines the merits of unsupervised manifold learning techniques and of mixtures of regressions. We validate our method with three publicly available data sets and we thoroughly benchmark four variants of the proposed algorithm with several state-of-the-art head-pose estimation methods.
Graphical methods for the sensitivity analysis in discriminant analysis
Kim, Youngil; Anderson-Cook, Christine M.; Dae-Heung, Jang
2015-09-30
Similar to regression, many measures to detect influential data points in discriminant analysis have been developed. Many follow similar principles as the diagnostic measures used in linear regression in the context of discriminant analysis. Here we focus on the impact on the predicted classification posterior probability when a data point is omitted. The new method is intuitive and easily interpretative compared to existing methods. We also propose a graphical display to show the individual movement of the posterior probability of other data points when a specific data point is omitted. This enables the summaries to capture the overall pattern ofmore » the change.« less
Bageshwar, Deepak; Khanvilkar, Vineeta; Kadam, Vilasrao
2011-01-01
A specific, precise and stability indicating high-performance thin-layer chromatographic method for simultaneous estimation of pantoprazole sodium and itopride hydrochloride in pharmaceutical formulations was developed and validated. The method employed TLC aluminium plates precoated with silica gel 60F254 as the stationary phase. The solvent system consisted of methanol:water:ammonium acetate; 4.0:1.0:0.5 (v/v/v). This system was found to give compact and dense spots for both itopride hydrochloride (Rf value of 0.55±0.02) and pantoprazole sodium (Rf value of 0.85±0.04). Densitometric analysis of both drugs was carried out in the reflectance–absorbance mode at 289 nm. The linear regression analysis data for the calibration plots showed a good linear relationship with R2=0.9988±0.0012 in the concentration range of 100–400 ng for pantoprazole sodium. Also, the linear regression analysis data for the calibration plots showed a good linear relationship with R2=0.9990±0.0008 in the concentration range of 200–1200 ng for itopride hydrochloride. The method was validated for specificity, precision, robustness and recovery. Statistical analysis proves that the method is repeatable and selective for the estimation of both the said drugs. As the method could effectively separate the drug from its degradation products, it can be employed as a stability indicating method. PMID:29403710
Bageshwar, Deepak; Khanvilkar, Vineeta; Kadam, Vilasrao
2011-11-01
A specific, precise and stability indicating high-performance thin-layer chromatographic method for simultaneous estimation of pantoprazole sodium and itopride hydrochloride in pharmaceutical formulations was developed and validated. The method employed TLC aluminium plates precoated with silica gel 60F 254 as the stationary phase. The solvent system consisted of methanol:water:ammonium acetate; 4.0:1.0:0.5 (v/v/v). This system was found to give compact and dense spots for both itopride hydrochloride ( R f value of 0.55±0.02) and pantoprazole sodium ( R f value of 0.85±0.04). Densitometric analysis of both drugs was carried out in the reflectance-absorbance mode at 289 nm. The linear regression analysis data for the calibration plots showed a good linear relationship with R 2 =0.9988±0.0012 in the concentration range of 100-400 ng for pantoprazole sodium. Also, the linear regression analysis data for the calibration plots showed a good linear relationship with R 2 =0.9990±0.0008 in the concentration range of 200-1200 ng for itopride hydrochloride. The method was validated for specificity, precision, robustness and recovery. Statistical analysis proves that the method is repeatable and selective for the estimation of both the said drugs. As the method could effectively separate the drug from its degradation products, it can be employed as a stability indicating method.
Emission and distribution of phosphine in paddy fields and its relationship with greenhouse gases.
Chen, Weiyi; Niu, Xiaojun; An, Shaorong; Sheng, Hong; Tang, Zhenghua; Yang, Zhiquan; Gu, Xiaohong
2017-12-01
Phosphine (PH 3 ), as a gaseous phosphide, plays an important role in the phosphorus cycle in ecosystems. In this study, the emission and distribution of phosphine, carbon dioxide (CO 2 ) and methane (CH 4 ) in paddy fields were investigated to speculate the future potential impacts of enhanced greenhouse effect on phosphorus cycle involved in phosphine by the method of Pearson correlation analysis and multiple linear regression analysis. During the whole period of rice growth, there was a significant positive correlation between CO 2 emission flux and PH 3 emission flux (r=0.592, p=0.026, n=14). Similarly, a significant positive correlation of emission flux was also observed between CH 4 and PH 3 (r=0.563, p=0.036, n=14). The linear regression relationship was determined as [PH 3 ] flux =0.007[CO 2 ] flux +0.063[CH 4 ] flux -4.638. No significant differences were observed for all values of matrix-bound phosphine (MBP), soil carbon dioxide (SCO 2 ), and soil methane (SCH 4 ) in paddy soils. However, there was a significant positive correlation between MBP and SCO 2 at heading, flowering and ripening stage. The correlation coefficients were 0.909, 0.890 and 0.827, respectively. In vertical distribution, MBP had the analogical variation trend with SCO 2 and SCH 4 . Through Pearson correlation analysis and multiple stepwise linear regression analysis, pH, redox potential (Eh), total phosphorus (TP) and acid phosphatase (ACP) were identified as the principal factors affecting MBP levels, with correlative rankings of Eh>pH>TP>ACP. The multiple stepwise regression model ([MBP]=0.456∗[ACP]+0.235∗[TP]-1.458∗[Eh]-36.547∗[pH]+352.298) was obtained. The findings in this study hold great reference values to the global biogeochemical cycling of phosphorus in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
Publication Trends of Doctoral Students in Three Fields from 1965-1995.
ERIC Educational Resources Information Center
Lee, Wade M.
2000-01-01
Describes a study that investigated the publication rates of successful doctoral students in the fields of analytical chemistry, experimental psychology, and American literature. Data analysis, including linear regression analysis, revealed differences in publication rates and in solo authorship that mirrored differences between the fields as a…
We compared two regression models, which are based on the Weibull and probit functions, for the analysis of pesticide toxicity data from laboratory studies on Illinois crop and native plant species. Both mathematical models are continuous, differentiable, strictly positive, and...
da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues
2015-01-01
This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil. Copyright © 2014 Elsevier B.V. All rights reserved.
Regression Model Optimization for the Analysis of Experimental Data
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
2007-01-05
positive / false negatives. The quantitative on-site methods were evaluated using linear regression analysis and relative percent difference (RPD) comparison...Conclusion ...............................................................................................3-9 3.2 Quantitative Analysis Using CRREL...3-37 3.3 Quantitative Analysis for NG by GC/TID.........................................................3-38 3.3.1 Introduction
Yang, Ruiqi; Wang, Fei; Zhang, Jialing; Zhu, Chonglei; Fan, Limei
2015-05-19
To establish the reference values of thalamus, caudate nucleus and lenticular nucleus diameters through fetal thalamic transverse section. A total of 265 fetuses at our hospital were randomly selected from November 2012 to August 2014. And the transverse and length diameters of thalamus, caudate nucleus and lenticular nucleus were measured. SPSS 19.0 statistical software was used to calculate the regression curve of fetal diameter changes and gestational weeks of pregnancy. P < 0.05 was considered as having statistical significance. The linear regression equation of fetal thalamic length diameter and gestational week was: Y = 0.051X+0.201, R = 0.876, linear regression equation of thalamic transverse diameter and fetal gestational week was: Y = 0.031X+0.229, R = 0.817, linear regression equation of fetal head of caudate nucleus length diameter and gestational age was: Y = 0.033X+0.101, R = 0.722, linear regression equation of fetal head of caudate nucleus transverse diameter and gestational week was: R = 0.025 - 0.046, R = 0.711, linear regression equation of fetal lentiform nucleus length diameter and gestational week was: Y = 0.046+0.229, R = 0.765, linear regression equation of fetal lentiform nucleus diameter and gestational week was: Y = 0.025 - 0.05, R = 0.772. Ultrasonic measurement of diameter of fetal thalamus caudate nucleus, and lenticular nucleus through thalamic transverse section is simple and convenient. And measurements increase with fetal gestational weeks and there is linear regression relationship between them.
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
Do Our Means of Inquiry Match our Intentions?
Petscher, Yaacov
2016-01-01
A key stage of the scientific method is the analysis of data, yet despite the variety of methods that are available to researchers they are most frequently distilled to a model that focuses on the average relation between variables. Although research questions are frequently conceived with broad inquiry in mind, most regression methods are limited in comprehensively evaluating how observed behaviors are related to each other. Quantile regression is a largely unknown yet well-suited analytic technique similar to traditional regression analysis, but allows for a more systematic approach to understanding complex associations among observed phenomena in the psychological sciences. Data from the National Education Longitudinal Study of 1988/2000 are used to illustrate how quantile regression overcomes the limitations of average associations in linear regression by showing that psychological well-being and sex each differentially relate to reading achievement depending on one’s level of reading achievement. PMID:27486410
Nejaim, Yuri; Aps, Johan K M; Groppo, Francisco Carlos; Haiter Neto, Francisco
2018-06-01
The purpose of this article was to evaluate the pharyngeal space volume, and the size and shape of the mandible and the hyoid bone, as well as their relationships, in patients with different facial types and skeletal classes. Furthermore, we estimated the volume of the pharyngeal space with a formula using only linear measurements. A total of 161 i-CAT Next Generation (Imaging Sciences International, Hatfield, Pa) cone-beam computed tomography images (80 men, 81 women; ages, 21-58 years; mean age, 27 years) were retrospectively studied. Skeletal class and facial type were determined for each patient from multiplanar reconstructions using the NemoCeph software (Nemotec, Madrid, Spain). Linear and angular measurements were performed using 3D imaging software (version 3.4.3; Carestream Health, Rochester, NY), and volumetric analysis of the pharyngeal space was carried out with ITK-SNAP (version 2.4.0; Cognitica, Philadelphia, Pa) segmentation software. For the statistics, analysis of variance and the Tukey test with a significance level of 0.05, Pearson correlation, and linear regression were used. The pharyngeal space volume, when correlated with mandible and hyoid bone linear and angular measurements, showed significant correlations with skeletal class or facial type. The linear regression performed to estimate the volume of the pharyngeal space showed an R of 0.92 and an adjusted R 2 of 0.8362. There were significant correlations between pharyngeal space volume, and the mandible and hyoid bone measurements, suggesting that the stomatognathic system should be evaluated in an integral and nonindividualized way. Furthermore, it was possible to develop a linear regression model, resulting in a useful formula for estimating the volume of the pharyngeal space. Copyright © 2018 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
Geszke-Moritz, Małgorzata; Moritz, Michał
2016-12-01
The present study deals with the adsorption of boldine onto pure and propyl-sulfonic acid-functionalized SBA-15, SBA-16 and mesocellular foam (MCF) materials. Siliceous adsorbents were characterized by nitrogen sorption analysis, transmission electron microscopy (TEM), scanning electron microscopy (SEM), Fourier-transform infrared (FT-IR) spectroscopy and thermogravimetric analysis. The equilibrium adsorption data were analyzed using the Langmuir, Freundlich, Redlich-Peterson, and Temkin isotherms. Moreover, the Dubinin-Radushkevich and Dubinin-Astakhov isotherm models based on the Polanyi adsorption potential were employed. The latter was calculated using two alternative formulas including solubility-normalized (S-model) and empirical C-model. In order to find the best-fit isotherm, both linear regression and nonlinear fitting analysis were carried out. The Dubinin-Astakhov (S-model) isotherm revealed the best fit to the experimental points for adsorption of boldine onto pure mesoporous materials using both linear and nonlinear fitting analysis. Meanwhile, the process of boldine sorption onto modified silicas was described the best by the Langmuir and Temkin isotherms using linear regression and nonlinear fitting analysis, respectively. The values of adsorption energy (below 8kJ/mol) indicate the physical nature of boldine adsorption onto unmodified silicas whereas the ionic interactions seem to be the main force of alkaloid adsorption onto functionalized sorbents (energy of adsorption above 8kJ/mol). Copyright © 2016 Elsevier B.V. All rights reserved.
Pointwise influence matrices for functional-response regression.
Reiss, Philip T; Huang, Lei; Wu, Pei-Shien; Chen, Huaihou; Colcombe, Stan
2017-12-01
We extend the notion of an influence or hat matrix to regression with functional responses and scalar predictors. For responses depending linearly on a set of predictors, our definition is shown to reduce to the conventional influence matrix for linear models. The pointwise degrees of freedom, the trace of the pointwise influence matrix, are shown to have an adaptivity property that motivates a two-step bivariate smoother for modeling nonlinear dependence on a single predictor. This procedure adapts to varying complexity of the nonlinear model at different locations along the function, and thereby achieves better performance than competing tensor product smoothers in an analysis of the development of white matter microstructure in the brain. © 2017, The International Biometric Society.
Chicken barn climate and hazardous volatile compounds control using simple linear regression and PID
NASA Astrophysics Data System (ADS)
Abdullah, A. H.; Bakar, M. A. A.; Shukor, S. A. A.; Saad, F. S. A.; Kamis, M. S.; Mustafa, M. H.; Khalid, N. S.
2016-07-01
The hazardous volatile compounds from chicken manure in chicken barn are potentially to be a health threat to the farm animals and workers. Ammonia (NH3) and hydrogen sulphide (H2S) produced in chicken barn are influenced by climate changes. The Electronic Nose (e-nose) is used for the barn's air, temperature and humidity data sampling. Simple Linear Regression is used to identify the correlation between temperature-humidity, humidity-ammonia and ammonia-hydrogen sulphide. MATLAB Simulink software was used for the sample data analysis using PID controller. Results shows that the performance of PID controller using the Ziegler-Nichols technique can improve the system controller to control climate in chicken barn.
Muradian, Kh K; Utko, N O; Mozzhukhina, T H; Pishel', I M; Litoshenko, O Ia; Bezrukov, V V; Fraĭfel'd, V E
2002-01-01
Correlative and regressive relations between the gaseous exchange, thermoregulation and mitochondrial protein content were analyzed by two- and three-dimensional statistics in mice. It has been shown that the pair wise linear methods of analysis did not reveal any significant correlation between the parameters under exploration. However, it became evident at three-dimensional and non-linear plotting for which the coefficients of multivariable correlation reached and even exceeded 0.7-0.8. The calculations based on partial differentiation of the multivariable regression equations allow to conclude that at certain values of VO2, VCO2 and body temperature negative relations between the systems of gaseous exchange and thermoregulation become dominating.
Morse Code, Scrabble, and the Alphabet
ERIC Educational Resources Information Center
Richardson, Mary; Gabrosek, John; Reischman, Diann; Curtiss, Phyliss
2004-01-01
In this paper we describe an interactive activity that illustrates simple linear regression. Students collect data and analyze it using simple linear regression techniques taught in an introductory applied statistics course. The activity is extended to illustrate checks for regression assumptions and regression diagnostics taught in an…
Analyzing industrial energy use through ordinary least squares regression models
NASA Astrophysics Data System (ADS)
Golden, Allyson Katherine
Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and production behavior, and identify opportunities for energy and cost savings. This thesis study also utilizes change-point and degree-day baseline energy models to disaggregate facility annual energy consumption into separate industrial end-user categories. The baseline energy model provides a suitable and economical alternative to sub-metering individual manufacturing equipment. One case study describes the conjoined use of baseline energy models and facility information gathered during a one-day onsite visit to perform an end-point energy analysis of an injection molding facility conducted by the Alabama Industrial Assessment Center. Applying baseline regression model results to the end-point energy analysis allowed the AIAC to better approximate the annual energy consumption of the facility's HVAC system.
Application of stepwise multiple regression techniques to inversion of Nimbus 'IRIS' observations.
NASA Technical Reports Server (NTRS)
Ohring, G.
1972-01-01
Exploratory studies with Nimbus-3 infrared interferometer-spectrometer (IRIS) data indicate that, in addition to temperature, such meteorological parameters as geopotential heights of pressure surfaces, tropopause pressure, and tropopause temperature can be inferred from the observed spectra with the use of simple regression equations. The technique of screening the IRIS spectral data by means of stepwise regression to obtain the best radiation predictors of meteorological parameters is validated. The simplicity of application of the technique and the simplicity of the derived linear regression equations - which contain only a few terms - suggest usefulness for this approach. Based upon the results obtained, suggestions are made for further development and exploitation of the stepwise regression analysis technique.
Sanagi, M Marsin; Nasir, Zalilah; Ling, Susie Lu; Hermawan, Dadan; Ibrahim, Wan Aini Wan; Naim, Ahmedy Abu
2010-01-01
Linearity assessment as required in method validation has always been subject to different interpretations and definitions by various guidelines and protocols. However, there are very limited applicable implementation procedures that can be followed by a laboratory chemist in assessing linearity. Thus, this work proposes a simple method for linearity assessment in method validation by a regression analysis that covers experimental design, estimation of the parameters, outlier treatment, and evaluation of the assumptions according to the International Union of Pure and Applied Chemistry guidelines. The suitability of this procedure was demonstrated by its application to an in-house validation for the determination of plasticizers in plastic food packaging by GC.
White, Sonia L J; Szűcs, Dénes
2012-01-04
The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice.
2012-01-01
Background The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Methods Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Results Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. Conclusion In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice. PMID:22217191
Changes in aerobic power of men, ages 25-70 yr
NASA Technical Reports Server (NTRS)
Jackson, A. S.; Beard, E. F.; Wier, L. T.; Ross, R. M.; Stuteville, J. E.; Blair, S. N.
1995-01-01
This study quantified and compared the cross-sectional and longitudinal influence of age, self-report physical activity (SR-PA), and body composition (%fat) on the decline of maximal aerobic power (VO2peak). The cross-sectional sample consisted of 1,499 healthy men ages 25-70 yr. The 156 men of the longitudinal sample were from the same population and examined twice, the mean time between tests was 4.1 (+/- 1.2) yr. Peak oxygen uptake was determined by indirect calorimetry during a maximal treadmill exercise test. The zero-order correlations between VO2peak and %fat (r = -0.62) and SR-PA (r = 0.58) were significantly (P < 0.05) higher that the age correlation (r = -0.45). Linear regression defined the cross-sectional age-related decline in VO2peak at 0.46 ml.kg-1.min-1.yr-1. Multiple regression analysis (R = 0.79) showed that nearly 50% of this cross-sectional decline was due to %fat and SR-PA, adding these lifestyle variables to the multiple regression model reduced the age regression weight to -0.26 ml.kg-1.min-1.yr-1. Statistically controlling for time differences between tests, general linear models analysis showed that longitudinal changes in aerobic power were due to independent changes in %fat and SR-PA, confirming the cross-sectional results.
Acoustic-articulatory mapping in vowels by locally weighted regression
McGowan, Richard S.; Berger, Michael A.
2009-01-01
A method for mapping between simultaneously measured articulatory and acoustic data is proposed. The method uses principal components analysis on the articulatory and acoustic variables, and mapping between the domains by locally weighted linear regression, or loess [Cleveland, W. S. (1979). J. Am. Stat. Assoc. 74, 829–836]. The latter method permits local variation in the slopes of the linear regression, assuming that the function being approximated is smooth. The methodology is applied to vowels of four speakers in the Wisconsin X-ray Microbeam Speech Production Database, with formant analysis. Results are examined in terms of (1) examples of forward (articulation-to-acoustics) mappings and inverse mappings, (2) distributions of local slopes and constants, (3) examples of correlations among slopes and constants, (4) root-mean-square error, and (5) sensitivity of formant frequencies to articulatory change. It is shown that the results are qualitatively correct and that loess performs better than global regression. The forward mappings show different root-mean-square error properties than the inverse mappings indicating that this method is better suited for the forward mappings than the inverse mappings, at least for the data chosen for the current study. Some preliminary results on sensitivity of the first two formant frequencies to the two most important articulatory principal components are presented. PMID:19813812
Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.
Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana
2012-02-01
In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P < 0.05 and P < 0.01) between the parameters of wheat dough studied by the Mixolab and its rheological properties measured with the Alveograph. A number of six predictive linear equations were obtained. Linear regression models gave multiple regression coefficients with R²(adjusted) > 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.
Supervised Learning for Dynamical System Learning.
Hefny, Ahmed; Downey, Carlton; Gordon, Geoffrey J
2015-01-01
Recently there has been substantial interest in spectral methods for learning dynamical systems. These methods are popular since they often offer a good tradeoff between computational and statistical efficiency. Unfortunately, they can be difficult to use and extend in practice: e.g., they can make it difficult to incorporate prior information such as sparsity or structure. To address this problem, we present a new view of dynamical system learning: we show how to learn dynamical systems by solving a sequence of ordinary supervised learning problems, thereby allowing users to incorporate prior knowledge via standard techniques such as L 1 regularization. Many existing spectral methods are special cases of this new framework, using linear regression as the supervised learner. We demonstrate the effectiveness of our framework by showing examples where nonlinear regression or lasso let us learn better state representations than plain linear regression does; the correctness of these instances follows directly from our general analysis.
NASA Astrophysics Data System (ADS)
Kang, Pilsang; Koo, Changhoi; Roh, Hokyu
2017-11-01
Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.
Jastreboff, P W
1979-06-01
Time histograms of neural responses evoked by sinuosidal stimulation often contain a slow drifting and an irregular noise which disturb Fourier analysis of these responses. Section 2 of this paper evaluates the extent to which a linear drift influences the Fourier analysis, and develops a combined Fourier and linear regression analysis for detecting and correcting for such a linear drift. Usefulness of this correcting method is demonstrated for the time histograms of actual eye movements and Purkinje cell discharges evoked by sinusoidal rotation of rabbits in the horizontal plane. In Sect. 3, the analysis of variance is adopted for estimating the probability of the random occurrence of the response curve extracted by Fourier analysis from noise. This method proved to be useful for avoiding false judgements as to whether the response curve was meaningful, particularly when the response was small relative to the contaminating noise.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Zhang, Chao; Jia, Pengli; Yu, Liu; Xu, Chang
2018-05-01
Dose-response meta-analysis (DRMA) is widely applied to investigate the dose-specific relationship between independent and dependent variables. Such methods have been in use for over 30 years and are increasingly employed in healthcare and clinical decision-making. In this article, we give an overview of the methodology used in DRMA. We summarize the commonly used regression model and the pooled method in DRMA. We also use an example to illustrate how to employ a DRMA by these methods. Five regression models, linear regression, piecewise regression, natural polynomial regression, fractional polynomial regression, and restricted cubic spline regression, were illustrated in this article to fit the dose-response relationship. And two types of pooling approaches, that is, one-stage approach and two-stage approach are illustrated to pool the dose-response relationship across studies. The example showed similar results among these models. Several dose-response meta-analysis methods can be used for investigating the relationship between exposure level and the risk of an outcome. However the methodology of DRMA still needs to be improved. © 2018 Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.
Analysis of reciprocal creatinine plots by two-phase linear regression.
Rowe, P A; Richardson, R E; Burton, P R; Morgan, A G; Burden, R P
1989-01-01
The progression of renal diseases is often monitored by the serial measurement of plasma creatinine. The slope of the linear relation that is frequently found between the reciprocal of creatinine concentration and time delineates the rate of change in renal function. Minor changes in slope, perhaps indicating response to therapeutic intervention, can be difficult to identify and yet be of clinical importance. We describe the application of two-phase linear regression to identify and characterise changes in slope using a microcomputer. The method fits two intersecting lines to the data by computing a least-squares estimate of the position of the slope change and its 95% confidence limits. This avoids the potential bias of fixing the change at a preconceived time corresponding with an alteration in treatment. The program then evaluates the statistical and clinical significance of the slope change and produces a graphical output to aid interpretation.
Influence of prolonged static stretching on motor unit firing properties.
Ye, Xin; Beck, Travis W; Wages, Nathan P
2016-05-01
The purpose of this study was to examine the influence of a stretching intervention on motor control strategy of the biceps brachii muscle. Ten men performed twelve 100-s passive static stretches of the biceps brachii. Before and after the intervention, isometric strength was tested during maximal voluntary contractions (MVCs) of the elbow flexors. Subjects also performed trapezoid isometric contractions at 30% and 70% of MVC. Surface electromyographic signals from the submaximal contractions were decomposed into individual motor unit action potential trains. Linear regression analysis was used to examine the relationship between motor unit mean firing rate and recruitment threshold. The stretching intervention caused significant decreases in y-intercepts of the linear regression lines. In addition, linear slopes at both intensities remained unchanged. Despite reduced motor unit firing rates following the stretches, the motor control scheme remained unchanged. © 2016 Wiley Periodicals, Inc.
Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.
Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu
2016-08-01
This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Linking brain-wide multivoxel activation patterns to behaviour: Examples from language and math.
Raizada, Rajeev D S; Tsao, Feng-Ming; Liu, Huei-Mei; Holloway, Ian D; Ansari, Daniel; Kuhl, Patricia K
2010-05-15
A key goal of cognitive neuroscience is to find simple and direct connections between brain and behaviour. However, fMRI analysis typically involves choices between many possible options, with each choice potentially biasing any brain-behaviour correlations that emerge. Standard methods of fMRI analysis assess each voxel individually, but then face the problem of selection bias when combining those voxels into a region-of-interest, or ROI. Multivariate pattern-based fMRI analysis methods use classifiers to analyse multiple voxels together, but can also introduce selection bias via data-reduction steps as feature selection of voxels, pre-selecting activated regions, or principal components analysis. We show here that strong brain-behaviour links can be revealed without any voxel selection or data reduction, using just plain linear regression as a classifier applied to the whole brain at once, i.e. treating each entire brain volume as a single multi-voxel pattern. The brain-behaviour correlations emerged despite the fact that the classifier was not provided with any information at all about subjects' behaviour, but instead was given only the neural data and its condition-labels. Surprisingly, more powerful classifiers such as a linear SVM and regularised logistic regression produce very similar results. We discuss some possible reasons why the very simple brain-wide linear regression model is able to find correlations with behaviour that are as strong as those obtained on the one hand from a specific ROI and on the other hand from more complex classifiers. In a manner which is unencumbered by arbitrary choices, our approach offers a method for investigating connections between brain and behaviour which is simple, rigorous and direct. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Ghoreishi, Mohammad; Abdi-Shahshahani, Mehdi; Peyman, Alireza; Pourazizi, Mohsen
2018-02-21
The aim of this study was to determine the correlation between ocular biometric parameters and sulcus-to-sulcus (STS) diameter. This was a cross-sectional study of preoperative ocular biometry data of patients who were candidates for phakic intraocular lens (IOL) surgery. Subjects underwent ocular biometry analysis, including refraction error evaluation using an autorefractor and Orbscan topography for white-to-white (WTW) corneal diameter and measurement. Pentacam was used to perform WTW corneal diameter and measurements of minimum and maximum keratometry (K). Measurements of STS and angle-to-angle (ATA) were obtained using a 50-MHz B-mode ultrasound device. Anterior optical coherence tomography was performed for anterior chamber depth measurement. Pearson's correlation test and stepwise linear regression analysis were used to find a model to predict STS. Fifty-eight eyes of 58 patients were enrolled. Mean age ± standard deviation of sample was 28.95 ± 6.04 years. The Pearson's correlation coefficient between STS with WTW, ATA, mean K was 0.383, 0.492, and - 0.353, respectively, which was statistically significant (all P < 0.001). Using stepwise linear regression analysis, there is a statistically significant association between STS with WTW (P = 0.011) and mean K (P = 0.025). The standardized coefficient was 0.323 and - 0.284 for WTW and mean K, respectively. The stepwise linear regression analysis equation was: (STS = 9.549 + 0.518 WTW - 0.083 mean K). Based on our result, given the correlation of STS with WTW and mean K and potential of direct and essay measurement of WTW and mean K, it seems that current IOL sizing protocols could be estimating with WTW and mean K.
Linking brain-wide multivoxel activation patterns to behaviour: Examples from language and math
Raizada, Rajeev D.S.; Tsao, Feng-Ming; Liu, Huei-Mei; Holloway, Ian D.; Ansari, Daniel; Kuhl, Patricia K.
2010-01-01
A key goal of cognitive neuroscience is to find simple and direct connections between brain and behaviour. However, fMRI analysis typically involves choices between many possible options, with each choice potentially biasing any brain–behaviour correlations that emerge. Standard methods of fMRI analysis assess each voxel individually, but then face the problem of selection bias when combining those voxels into a region-of-interest, or ROI. Multivariate pattern-based fMRI analysis methods use classifiers to analyse multiple voxels together, but can also introduce selection bias via data-reduction steps as feature selection of voxels, pre-selecting activated regions, or principal components analysis. We show here that strong brain–behaviour links can be revealed without any voxel selection or data reduction, using just plain linear regression as a classifier applied to the whole brain at once, i.e. treating each entire brain volume as a single multi-voxel pattern. The brain–behaviour correlations emerged despite the fact that the classifier was not provided with any information at all about subjects' behaviour, but instead was given only the neural data and its condition-labels. Surprisingly, more powerful classifiers such as a linear SVM and regularised logistic regression produce very similar results. We discuss some possible reasons why the very simple brain-wide linear regression model is able to find correlations with behaviour that are as strong as those obtained on the one hand from a specific ROI and on the other hand from more complex classifiers. In a manner which is unencumbered by arbitrary choices, our approach offers a method for investigating connections between brain and behaviour which is simple, rigorous and direct. PMID:20132896
GWAS with longitudinal phenotypes: performance of approximate procedures
Sikorska, Karolina; Montazeri, Nahid Mostafavi; Uitterlinden, André; Rivadeneira, Fernando; Eilers, Paul HC; Lesaffre, Emmanuel
2015-01-01
Analysis of genome-wide association studies with longitudinal data using standard procedures, such as linear mixed model (LMM) fitting, leads to discouragingly long computation times. There is a need to speed up the computations significantly. In our previous work (Sikorska et al: Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat Med 2012; 32.1: 165–180), we proposed the conditional two-step (CTS) approach as a fast method providing an approximation to the P-value for the longitudinal single-nucleotide polymorphism (SNP) effect. In the first step a reduced conditional LMM is fit, omitting all the SNP terms. In the second step, the estimated random slopes are regressed on SNPs. The CTS has been applied to the bone mineral density data from the Rotterdam Study and proved to work very well even in unbalanced situations. In another article (Sikorska et al: GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics 2013; 14: 166), we suggested semi-parallel computations, greatly speeding up fitting many linear regressions. Combining CTS with fast linear regression reduces the computation time from several weeks to a few minutes on a single computer. Here, we explore further the properties of the CTS both analytically and by simulations. We investigate the performance of our proposal in comparison with a related but different approach, the two-step procedure. It is analytically shown that for the balanced case, under mild assumptions, the P-value provided by the CTS is the same as from the LMM. For unbalanced data and in realistic situations, simulations show that the CTS method does not inflate the type I error rate and implies only a minimal loss of power. PMID:25712081
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
Construction of mathematical model for measuring material concentration by colorimetric method
NASA Astrophysics Data System (ADS)
Liu, Bing; Gao, Lingceng; Yu, Kairong; Tan, Xianghua
2018-06-01
This paper use the method of multiple linear regression to discuss the data of C problem of mathematical modeling in 2017. First, we have established a regression model for the concentration of 5 substances. But only the regression model of the substance concentration of urea in milk can pass through the significance test. The regression model established by the second sets of data can pass the significance test. But this model exists serious multicollinearity. We have improved the model by principal component analysis. The improved model is used to control the system so that it is possible to measure the concentration of material by direct colorimetric method.
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
Age and mortality after injury: is the association linear?
Friese, R S; Wynne, J; Joseph, B; Hashmi, A; Diven, C; Pandit, V; O'Keeffe, T; Zangbar, B; Kulvatunyou, N; Rhee, P
2014-10-01
Multiple studies have demonstrated a linear association between advancing age and mortality after injury. An inflection point, or an age at which outcomes begin to differ, has not been previously described. We hypothesized that the relationship between age and mortality after injury is non-linear and an inflection point exists. We performed a retrospective cohort analysis at our urban level I center from 2007 through 2009. All patients aged 65 years and older with the admission diagnosis of injury were included. Non-parametric logistic regression was used to identify the functional form between mortality and age. Multivariate logistic regression was utilized to explore the association between age and mortality. Age 65 years was used as the reference. Significance was defined as p < 0.05. A total of 1,107 patients were included in the analysis. One-third required intensive care unit (ICU) admission and 48 % had traumatic brain injury. 229 patients (20.6 %) were 84 years of age or older. The overall mortality was 7.2 %. Our model indicates that mortality is a quadratic function of age. After controlling for confounders, age is associated with mortality with a regression coefficient of 1.08 for the linear term (p = 0.02) and a regression coefficient of -0.006 for the quadratic term (p = 0.03). The model identified 84.4 years of age as the inflection point at which mortality rates begin to decline. The risk of death after injury varies linearly with age until 84 years. After 84 years of age, the mortality rates decline. These findings may reflect the varying severity of comorbidities and differences in baseline functional status in elderly trauma patients. Specifically, a proportion of our injured patient population less than 84 years old may be more frail, contributing to increased mortality after trauma, whereas a larger proportion of our injured patients over 84 years old, by virtue of reaching this advanced age, may, in fact, be less frail, contributing to less risk of death.
Time Advice and Learning Questions in Computer Simulations
ERIC Educational Resources Information Center
Rey, Gunter Daniel
2011-01-01
Students (N = 101) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without time advice) x 3 (with learning questions and corrective feedback, with…
Patterns of Library Use by Undergraduate Students in a Chilean University
ERIC Educational Resources Information Center
Jara, Magdalena; Clasing, Paula; Gonzalez, Carlos; Montenegro, Maximiliano; Kelly, Nick; Alarcón, Rosa; Sandoval, Augusto; Saurina, Elvira
2017-01-01
This paper explores the patterns of use of print materials and digital resources in an undergraduate library in a Chilean university, by the students' discipline and year of study. A quantitative analysis was carried out, including descriptive analysis of contingency tables, chi-squared tests, t-tests, and multiple linear regressions. The results…
ERIC Educational Resources Information Center
Duxbury, Mark
2004-01-01
An enzymatic laboratory experiment based on the analysis of serum is described that is suitable for students of clinical chemistry. The experiment incorporates an introduction to mathematical method-comparison techniques in which three different clinical glucose analysis methods are compared using linear regression and Bland-Altman difference…
NASA Astrophysics Data System (ADS)
George, Anna Ray Bayless
A study was conducted to determine the relationship between the credentials held by science teachers who taught at a school that administered the Science Texas Assessment on Knowledge and Skills (Science TAKS), the state standardized exam in science, at grade 11 and student performance on a state standardized exam in science administered in grade 11. Years of teaching experience, teacher certification type(s), highest degree level held, teacher and school demographic information, and the percentage of students who met the passing standard on the Science TAKS were obtained through a public records request to the Texas Education Agency (TEA) and the State Board for Educator Certification (SBEC). Analysis was performed through the use of canonical correlation analysis and multiple linear regression analysis. The results of the multiple linear regression analysis indicate that a larger percentage of students met the passing standard on the Science TAKS state attended schools in which a large portion of the high school science teachers held post baccalaureate degrees, elementary and physical science certifications, and had 11-20 years of teaching experience.
NASA Astrophysics Data System (ADS)
Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari
2018-01-01
Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg/day) in Jenderan catchment area.
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
ESTER HYDROLYSIS RATE CONSTANT PREDICTION FROM INFRARED INTERFEROGRAMS
A method for predicting reactivity parameters of organic chemicals from spectroscopic data is being developed to assist in assessing the environmental fate of pollutants. he prototype system, which employs multiple linear regression analysis using selected points from the Fourier...
A Regression Framework for Effect Size Assessments in Longitudinal Modeling of Group Differences
Feingold, Alan
2013-01-01
The use of growth modeling analysis (GMA)--particularly multilevel analysis and latent growth modeling--to test the significance of intervention effects has increased exponentially in prevention science, clinical psychology, and psychiatry over the past 15 years. Model-based effect sizes for differences in means between two independent groups in GMA can be expressed in the same metric (Cohen’s d) commonly used in classical analysis and meta-analysis. This article first reviews conceptual issues regarding calculation of d for findings from GMA and then introduces an integrative framework for effect size assessments that subsumes GMA. The new approach uses the structure of the linear regression model, from which effect sizes for findings from diverse cross-sectional and longitudinal analyses can be calculated with familiar statistics, such as the regression coefficient, the standard deviation of the dependent measure, and study duration. PMID:23956615
Hemmila, April; McGill, Jim; Ritter, David
2008-03-01
To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.
Gimelfarb, A.; Willis, J. H.
1994-01-01
An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
An Expert System for the Evaluation of Cost Models
1990-09-01
contrast to the condition of equal error variance, called homoscedasticity. (Reference: Applied Linear Regression Models by John Neter - page 423...normal. (Reference: Applied Linear Regression Models by John Neter - page 125) Click Here to continue -> Autocorrelation Click Here for the index - Index...over time. Error terms correlated over time are said to be autocorrelated or serially correlated. (REFERENCE: Applied Linear Regression Models by John
Tokunaga, Makoto; Watanabe, Susumu; Sonoda, Shigeru
2017-09-01
Multiple linear regression analysis is often used to predict the outcome of stroke rehabilitation. However, the predictive accuracy may not be satisfactory. The objective of this study was to elucidate the predictive accuracy of a method of calculating motor Functional Independence Measure (mFIM) at discharge from mFIM effectiveness predicted by multiple regression analysis. The subjects were 505 patients with stroke who were hospitalized in a convalescent rehabilitation hospital. The formula "mFIM at discharge = mFIM effectiveness × (91 points - mFIM at admission) + mFIM at admission" was used. By including the predicted mFIM effectiveness obtained through multiple regression analysis in this formula, we obtained the predicted mFIM at discharge (A). We also used multiple regression analysis to directly predict mFIM at discharge (B). The correlation between the predicted and the measured values of mFIM at discharge was compared between A and B. The correlation coefficients were .916 for A and .878 for B. Calculating mFIM at discharge from mFIM effectiveness predicted by multiple regression analysis had a higher degree of predictive accuracy of mFIM at discharge than that directly predicted. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Xuntao; Feng, Jianhu; Wang, Hu; Hong, Shidi; Zheng, Supei
2018-03-01
A three-dimensional finite element box girder bridge and its asphalt concrete deck pavement were established by ANSYS software, and the interlayer bonding condition of asphalt concrete deck pavement was assumed to be contact bonding condition. Orthogonal experimental design is used to arrange the testing plans of material parameters, and an evaluation of the effect of different material parameters in the mechanical response of asphalt concrete surface layer was conducted by multiple linear regression model and using the results from the finite element analysis. Results indicated that stress regression equations can well predict the stress of the asphalt concrete surface layer, and elastic modulus of waterproof layer has a significant influence on stress values of asphalt concrete surface layer.
Technical note: A linear model for predicting δ13 Cprotein.
Pestle, William J; Hubbe, Mark; Smith, Erin K; Stevenson, Joseph M
2015-08-01
Development of a model for the prediction of δ(13) Cprotein from δ(13) Ccollagen and Δ(13) Cap-co . Model-generated values could, in turn, serve as "consumer" inputs for multisource mixture modeling of paleodiet. Linear regression analysis of previously published controlled diet data facilitated the development of a mathematical model for predicting δ(13) Cprotein (and an experimentally generated error term) from isotopic data routinely generated during the analysis of osseous remains (δ(13) Cco and Δ(13) Cap-co ). Regression analysis resulted in a two-term linear model (δ(13) Cprotein (%) = (0.78 × δ(13) Cco ) - (0.58× Δ(13) Cap-co ) - 4.7), possessing a high R-value of 0.93 (r(2) = 0.86, P < 0.01), and experimentally generated error terms of ±1.9% for any predicted individual value of δ(13) Cprotein . This model was tested using isotopic data from Formative Period individuals from northern Chile's Atacama Desert. The model presented here appears to hold significant potential for the prediction of the carbon isotope signature of dietary protein using only such data as is routinely generated in the course of stable isotope analysis of human osseous remains. These predicted values are ideal for use in multisource mixture modeling of dietary protein source contribution. © 2015 Wiley Periodicals, Inc.
Singh, Jagmahender; Pathak, R K; Chavali, Krishnadutt H
2011-03-20
Skeletal height estimation from regression analysis of eight sternal lengths in the subjects of Chandigarh zone of Northwest India is the topic of discussion in this study. Analysis of eight sternal lengths (length of manubrium, length of mesosternum, combined length of manubrium and mesosternum, total sternal length and first four intercostals lengths of mesosternum) measured from 252 male and 91 female sternums obtained at postmortems revealed that mean cadaver stature and sternal lengths were more in North Indians and males than the South Indians and females. Except intercostal lengths, all the sternal lengths were positively correlated with stature of the deceased in both sexes (P < 0.001). The multiple regression analysis of sternal lengths was found more useful than the linear regression for stature estimation. Using multivariate regression analysis, the combined length of manubrium and mesosternum in both sexes and the length of manubrium along with 2nd and 3rd intercostal lengths of mesosternum in males were selected as best estimators of stature. Nonetheless, the stature of males can be predicted with SEE of 6.66 (R(2) = 0.16, r = 0.318) from combination of MBL+BL_3+LM+BL_2, and in females from MBL only, it can be estimated with SEE of 6.65 (R(2) = 0.10, r = 0.318), whereas from the multiple regression analysis of pooled data, stature can be known with SEE of 6.97 (R(2) = 0.387, r = 575) from the combination of MBL+LM+BL_2+TSL+BL_3. The R(2) and F-ratio were found to be statistically significant for almost all the variables in both the sexes, except 4th intercostal length in males and 2nd to 4th intercostal lengths in females. The 'major' sternal lengths were more useful than the 'minor' ones for stature estimation The universal regression analysis used by Kanchan et al. [39] when applied to sternal lengths, gave satisfactory estimates of stature for males only but female stature was comparatively better estimated from simple linear regressions. But they are not proposed for the subjects of known sex, as they underestimate the male and overestimate female stature. However, intercostal lengths were found to be the poor estimators of stature (P < 0.05). And also sternal lengths exhibit weaker correlation coefficients and higher standard errors of estimate. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Arsenyev, P A; Trezvov, V V; Saratovskaya, N V
1997-01-01
This work represents a method, which allows to determine phase composition of calcium hydroxylapatite basing on its infrared spectrum. The method uses factor analysis of the spectral data of calibration set of samples to determine minimal number of factors required to reproduce the spectra within experimental error. Multiple linear regression is applied to establish correlation between factor scores of calibration standards and their properties. The regression equations can be used to predict the property value of unknown sample. The regression model was built for determination of beta-tricalcium phosphate content in hydroxylapatite. Statistical estimation of quality of the model was carried out. Application of the factor analysis on spectral data allows to increase accuracy of beta-tricalcium phosphate determination and expand the range of determination towards its less concentration. Reproducibility of results is retained.
[Visual field progression in glaucoma: cluster analysis].
Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M
2012-11-01
Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Functional mixture regression.
Yao, Fang; Fu, Yuejiao; Lee, Thomas C M
2011-04-01
In functional linear models (FLMs), the relationship between the scalar response and the functional predictor process is often assumed to be identical for all subjects. Motivated by both practical and methodological considerations, we relax this assumption and propose a new class of functional regression models that allow the regression structure to vary for different groups of subjects. By projecting the predictor process onto its eigenspace, the new functional regression model is simplified to a framework that is similar to classical mixture regression models. This leads to the proposed approach named as functional mixture regression (FMR). The estimation of FMR can be readily carried out using existing software implemented for functional principal component analysis and mixture regression. The practical necessity and performance of FMR are illustrated through applications to a longevity analysis of female medflies and a human growth study. Theoretical investigations concerning the consistent estimation and prediction properties of FMR along with simulation experiments illustrating its empirical properties are presented in the supplementary material available at Biostatistics online. Corresponding results demonstrate that the proposed approach could potentially achieve substantial gains over traditional FLMs.
Bertrand-Krajewski, J L
2004-01-01
In order to replace traditional sampling and analysis techniques, turbidimeters can be used to estimate TSS concentration in sewers, by means of sensor and site specific empirical equations established by linear regression of on-site turbidity Tvalues with TSS concentrations C measured in corresponding samples. As the ordinary least-squares method is not able to account for measurement uncertainties in both T and C variables, an appropriate regression method is used to solve this difficulty and to evaluate correctly the uncertainty in TSS concentrations estimated from measured turbidity. The regression method is described, including detailed calculations of variances and covariance in the regression parameters. An example of application is given for a calibrated turbidimeter used in a combined sewer system, with data collected during three dry weather days. In order to show how the established regression could be used, an independent 24 hours long dry weather turbidity data series recorded at 2 min time interval is used, transformed into estimated TSS concentrations, and compared to TSS concentrations measured in samples. The comparison appears as satisfactory and suggests that turbidity measurements could replace traditional samples. Further developments, including wet weather periods and other types of sensors, are suggested.
Robbins, Blaine
2013-01-01
Sociologists, political scientists, and economists all suggest that culture plays a pivotal role in the development of large-scale cooperation. In this study, I used generalized trust as a measure of culture to explore if and how culture impacts intentional homicide, my operationalization of cooperation. I compiled multiple cross-national data sets and used pooled time-series linear regression, single-equation instrumental-variables linear regression, and fixed- and random-effects estimation techniques on an unbalanced panel of 118 countries and 232 observations spread over a 15-year time period. Results suggest that culture and large-scale cooperation form a tenuous relationship, while economic factors such as development, inequality, and geopolitics appear to drive large-scale cooperation.
Linear regression models and k-means clustering for statistical analysis of fNIRS data.
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-02-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.
Linear regression models and k-means clustering for statistical analysis of fNIRS data
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-01-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets. PMID:25780751
A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection
Sabourin, Jeremy A; Valdar, William; Nobel, Andrew B
2015-01-01
Summary We describe a simple, computationally effcient, permutation-based procedure for selecting the penalty parameter in LASSO penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), Scaled Sparse Linear Regression, and a selection method based on recently developed testing procedures for the LASSO. PMID:26243050
Evaluating and Improving the SAMA (Segmentation Analysis and Market Assessment) Recruiting Model
2015-06-01
and rewarding me with your love every day. xx THIS PAGE INTENTIONALLY LEFT BLANK 1 I. INTRODUCTION A. THE UNITED STATES ARMY RECRUITING...the relationship between the calculated SAMA potential and the actual 2014 performance. The scatterplot in Figure 8 shows a strong linear... relationship between the SAMA calculated potential and the contracting achievement for 2014, with an R-squared value of 0.871. Simple Linear Regression of
Montoye, Alexander H K; Begum, Munni; Henning, Zachary; Pfeiffer, Karin A
2017-02-01
This study had three purposes, all related to evaluating energy expenditure (EE) prediction accuracy from body-worn accelerometers: (1) compare linear regression to linear mixed models, (2) compare linear models to artificial neural network models, and (3) compare accuracy of accelerometers placed on the hip, thigh, and wrists. Forty individuals performed 13 activities in a 90 min semi-structured, laboratory-based protocol. Participants wore accelerometers on the right hip, right thigh, and both wrists and a portable metabolic analyzer (EE criterion). Four EE prediction models were developed for each accelerometer: linear regression, linear mixed, and two ANN models. EE prediction accuracy was assessed using correlations, root mean square error (RMSE), and bias and was compared across models and accelerometers using repeated-measures analysis of variance. For all accelerometer placements, there were no significant differences for correlations or RMSE between linear regression and linear mixed models (correlations: r = 0.71-0.88, RMSE: 1.11-1.61 METs; p > 0.05). For the thigh-worn accelerometer, there were no differences in correlations or RMSE between linear and ANN models (ANN-correlations: r = 0.89, RMSE: 1.07-1.08 METs. Linear models-correlations: r = 0.88, RMSE: 1.10-1.11 METs; p > 0.05). Conversely, one ANN had higher correlations and lower RMSE than both linear models for the hip (ANN-correlation: r = 0.88, RMSE: 1.12 METs. Linear models-correlations: r = 0.86, RMSE: 1.18-1.19 METs; p < 0.05), and both ANNs had higher correlations and lower RMSE than both linear models for the wrist-worn accelerometers (ANN-correlations: r = 0.82-0.84, RMSE: 1.26-1.32 METs. Linear models-correlations: r = 0.71-0.73, RMSE: 1.55-1.61 METs; p < 0.01). For studies using wrist-worn accelerometers, machine learning models offer a significant improvement in EE prediction accuracy over linear models. Conversely, linear models showed similar EE prediction accuracy to machine learning models for hip- and thigh-worn accelerometers and may be viable alternative modeling techniques for EE prediction for hip- or thigh-worn accelerometers.
Ecologic regression analysis and the study of the influence of air quality on mortality.
Selvin, S; Merrill, D; Wong, L; Sacks, S T
1984-01-01
This presentation focuses entirely on the use and evaluation of regression analysis applied to ecologic data as a method to study the effects of ambient air pollution on mortality rates. Using extensive national data on mortality, air quality and socio-economic status regression analyses are used to study the influence of air quality on mortality. The analytic methods and data are selected in such a way that direct comparisons can be made with other ecologic regression studies of mortality and air quality. Analyses are performed by use of two types of geographic areas, age-specific mortality of both males and females and three pollutants (total suspended particulates, sulfur dioxide and nitrogen dioxide). The overall results indicate no persuasive evidence exists of a link between air quality and general mortality levels. Additionally, a lack of consistency between the present results and previous published work is noted. Overall, it is concluded that linear regression analysis applied to nationally collected ecologic data cannot be used to usefully infer a causal relationship between air quality and mortality which is in direct contradiction to other major published studies. PMID:6734568
NASA Astrophysics Data System (ADS)
Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.
2014-09-01
Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.
Linkages between benthic macroinvertebrate assemblages and landscape stressors in the US Great Lakes
We used multiple linear regression analysis to investigate relationships between benthic macroinvertebrate assemblages in the nearshore region of the Laurentian Great Lakes and landscape characteristics in adjacent watersheds. Benthic invertebrate data were obtained from the 201...
Interior car noise created by textured pavement surfaces : final report.
DOT National Transportation Integrated Search
1975-01-01
Because of widespread concern about the effect of textured pavement surfaces on interior car noise, sound pressure levels (SPL) were measured inside a test vehicle as it traversed 21 pavements with various textures. A linear regression analysis run o...
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
"Mad or bad?": burden on caregivers of patients with personality disorders.
Bauer, Rita; Döring, Antje; Schmidt, Tanja; Spießl, Hermann
2012-12-01
The burden on caregivers of patients with personality disorders is often greatly underestimated or completely disregarded. Possibilities for caregiver support have rarely been assessed. Thirty interviews were conducted with caregivers of such patients to assess illness-related burden. Responses were analyzed with a mixed method of qualitative and quantitative analysis in a sequential design. Patient and caregiver data, including sociodemographic and disease-related variables, were evaluated with regression analysis and regression trees. Caregiver statements (n = 404) were summarized into 44 global statements. The most frequent global statements were worries about the burden on other family members (70.0%), poor cooperation with clinical centers and other institutions (60.0%), financial burden (56.7%), worry about the patient's future (53.3%), and dissatisfaction with the patient's treatment and rehabilitation (53.3%). Linear regression and regression tree analysis identified predictors for more burdened caregivers. Caregivers of patients with personality disorders experience a variety of burdens, some disorder specific. Yet these caregivers often receive little attention or support.
Control Variate Selection for Multiresponse Simulation.
1987-05-01
M. H. Knuter, Applied Linear Regression Mfodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F., Probability, Allyn and Bacon...1982. Neter, J., V. Wasserman, and M. H. Knuter, Applied Linear Regression .fodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F...Aspects of J%,ultivariate Statistical Theory, John Wiley and Sons, New York, New York, 1982. dY Neter, J., W. Wasserman, and M. H. Knuter, Applied Linear Regression Mfodels
ERIC Educational Resources Information Center
Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael
2011-01-01
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
High correlations between MRI brain volume measurements based on NeuroQuant® and FreeSurfer.
Ross, David E; Ochs, Alfred L; Tate, David F; Tokac, Umit; Seabaugh, John; Abildskov, Tracy J; Bigler, Erin D
2018-05-30
NeuroQuant ® (NQ) and FreeSurfer (FS) are commonly used computer-automated programs for measuring MRI brain volume. Previously they were reported to have high intermethod reliabilities but often large intermethod effect size differences. We hypothesized that linear transformations could be used to reduce the large effect sizes. This study was an extension of our previously reported study. We performed NQ and FS brain volume measurements on 60 subjects (including normal controls, patients with traumatic brain injury, and patients with Alzheimer's disease). We used two statistical approaches in parallel to develop methods for transforming FS volumes into NQ volumes: traditional linear regression, and Bayesian linear regression. For both methods, we used regression analyses to develop linear transformations of the FS volumes to make them more similar to the NQ volumes. The FS-to-NQ transformations based on traditional linear regression resulted in effect sizes which were small to moderate. The transformations based on Bayesian linear regression resulted in all effect sizes being trivially small. To our knowledge, this is the first report describing a method for transforming FS to NQ data so as to achieve high reliability and low effect size differences. Machine learning methods like Bayesian regression may be more useful than traditional methods. Copyright © 2018 Elsevier B.V. All rights reserved.
Yokoo, Takeshi; Serai, Suraj D; Pirasteh, Ali; Bashir, Mustafa R; Hamilton, Gavin; Hernando, Diego; Hu, Houchun H; Hetterich, Holger; Kühn, Jens-Peter; Kukuk, Guido M; Loomba, Rohit; Middleton, Michael S; Obuchowski, Nancy A; Song, Ji Soo; Tang, An; Wu, Xinhuai; Reeder, Scott B; Sirlin, Claude B
2018-02-01
Purpose To determine the linearity, bias, and precision of hepatic proton density fat fraction (PDFF) measurements by using magnetic resonance (MR) imaging across different field strengths, imager manufacturers, and reconstruction methods. Materials and Methods This meta-analysis was performed in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A systematic literature search identified studies that evaluated the linearity and/or bias of hepatic PDFF measurements by using MR imaging (hereafter, MR imaging-PDFF) against PDFF measurements by using colocalized MR spectroscopy (hereafter, MR spectroscopy-PDFF) or the precision of MR imaging-PDFF. The quality of each study was evaluated by using the Quality Assessment of Studies of Diagnostic Accuracy 2 tool. De-identified original data sets from the selected studies were pooled. Linearity was evaluated by using linear regression between MR imaging-PDFF and MR spectroscopy-PDFF measurements. Bias, defined as the mean difference between MR imaging-PDFF and MR spectroscopy-PDFF measurements, was evaluated by using Bland-Altman analysis. Precision, defined as the agreement between repeated MR imaging-PDFF measurements, was evaluated by using a linear mixed-effects model, with field strength, imager manufacturer, reconstruction method, and region of interest as random effects. Results Twenty-three studies (1679 participants) were selected for linearity and bias analyses and 11 studies (425 participants) were selected for precision analyses. MR imaging-PDFF was linear with MR spectroscopy-PDFF (R 2 = 0.96). Regression slope (0.97; P < .001) and mean Bland-Altman bias (-0.13%; 95% limits of agreement: -3.95%, 3.40%) indicated minimal underestimation by using MR imaging-PDFF. MR imaging-PDFF was precise at the region-of-interest level, with repeatability and reproducibility coefficients of 2.99% and 4.12%, respectively. Field strength, imager manufacturer, and reconstruction method each had minimal effects on reproducibility. Conclusion MR imaging-PDFF has excellent linearity, bias, and precision across different field strengths, imager manufacturers, and reconstruction methods. © RSNA, 2017 Online supplemental material is available for this article. An earlier incorrect version of this article appeared online. This article was corrected on October 2, 2017.
Effect Size Measure and Analysis of Single Subject Designs
ERIC Educational Resources Information Center
Society for Research on Educational Effectiveness, 2013
2013-01-01
One of the vexing problems in the analysis of SSD is in the assessment of the effect of intervention. Serial dependence notwithstanding, the linear model approach that has been advanced involves, in general, the fitting of regression lines (or curves) to the set of observations within each phase of the design and comparing the parameters of these…
Tuuli, Methodius G; Odibo, Anthony O
2011-08-01
The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.
Instructional Advice, Time Advice and Learning Questions in Computer Simulations
ERIC Educational Resources Information Center
Rey, Gunter Daniel
2010-01-01
Undergraduate students (N = 97) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without instructional advice) x 2 (with or without time advice) x 2…
Thematic Mapper Analysis of Blue Oak (Quercus douglasii) in Central California
Paul A. Lefebvre Jr.; Frank W. Davis; Mark Borchert
1991-01-01
Digital Thematic Mapper (TM) satellite data from September 1986 and December 1985 were analyzed to determine seasonal reflectance properties of blue oak rangeland in the La Panza mountains of San Luis Obispo County. Linear regression analysis was conducted to examine relationships between TM reflectance and oak canopy cover, basal area, and site topographic variables....
Forecasting Container Throughput at the Doraleh Port in Djibouti through Time Series Analysis
NASA Astrophysics Data System (ADS)
Mohamed Ismael, Hawa; Vandyck, George Kobina
The Doraleh Container Terminal (DCT) located in Djibouti has been noted as the most technologically advanced container terminal on the African continent. DCT's strategic location at the crossroads of the main shipping lanes connecting Asia, Africa and Europe put it in a unique position to provide important shipping services to vessels plying that route. This paper aims to forecast container throughput through the Doraleh Container Port in Djibouti by Time Series Analysis. A selection of univariate forecasting models has been used, namely Triple Exponential Smoothing Model, Grey Model and Linear Regression Model. By utilizing the above three models and their combination, the forecast of container throughput through the Doraleh port was realized. A comparison of the different forecasting results of the three models, in addition to the combination forecast is then undertaken, based on commonly used evaluation criteria Mean Absolute Deviation (MAD) and Mean Absolute Percentage Error (MAPE). The study found that the Linear Regression forecasting Model was the best prediction method for forecasting the container throughput, since its forecast error was the least. Based on the regression model, a ten (10) year forecast for container throughput at DCT has been made.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-01-01
Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393
Roy, Banibrata; Ripstein, Ira; Perry, Kyle; Cohen, Barry
2016-01-01
To determine whether the pre-medical Grade Point Average (GPA), Medical College Admission Test (MCAT), Internal examinations (Block) and National Board of Medical Examiners (NBME) scores are correlated with and predict the Medical Council of Canada Qualifying Examination Part I (MCCQE-1) scores. Data from 392 admitted students in the graduating classes of 2010-2013 at University of Manitoba (UofM), College of Medicine was considered. Pearson's correlation to assess the strength of the relationship, multiple linear regression to estimate MCCQE-1 score and stepwise linear regression to investigate the amount of variance were employed. Complete data from 367 (94%) students were studied. The MCCQE-1 had a moderate-to-large positive correlation with NBME scores and Block scores but a low correlation with GPA and MCAT scores. The multiple linear regression model gives a good estimate of the MCCQE-1 (R2 =0.604). Stepwise regression analysis demonstrated that 59.2% of the variation in the MCCQE-1 was accounted for by the NBME, but only 1.9% by the Block exams, and negligible variation came from the GPA and the MCAT. Amongst all the examinations used at UofM, the NBME is most closely correlated with MCCQE-1.
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Ochi, H; Ikuma, I; Toda, H; Shimada, T; Morioka, S; Moriyama, K
1989-12-01
In order to determine whether isovolumic relaxation period (IRP) reflects left ventricular relaxation under different afterload conditions, 17 anesthetized, open chest dogs were studied, and the left ventricular pressure decay time constant (T) was calculated. In 12 dogs, angiotensin II and nitroprusside were administered, with the heart rate constant at 90 beats/min. Multiple linear regression analysis showed that the aortic dicrotic notch pressure (AoDNP) and T were major determinants of IRP, while left ventricular end-diastolic pressure was a minor determinant. Multiple linear regression analysis, correlating T with IRP and AoDNP, did not further improve the correlation coefficient compared with that between T and IRP. We concluded that correction of the IRP by AoDNP is not necessary to predict T from additional multiple linear regression. The effects of ascending aortic constriction or angiotensin II on IRP were examined in five dogs, after pretreatment with propranolol. Aortic constriction caused a significant decrease in IRP and T, while angiotensin II produced a significant increase in IRP and T. IRP was affected by the change of afterload. However, the IRP and T values were always altered in the same direction. These results demonstrate that IRP is substituted for T and it reflects left ventricular relaxation even in different afterload conditions. We conclude that IRP is a simple parameter easily used to evaluate left ventricular relaxation in clinical situations.
Troutman, Brent M.
1982-01-01
Errors in runoff prediction caused by input data errors are analyzed by treating precipitation-runoff models as regression (conditional expectation) models. Independent variables of the regression consist of precipitation and other input measurements; the dependent variable is runoff. In models using erroneous input data, prediction errors are inflated and estimates of expected storm runoff for given observed input variables are biased. This bias in expected runoff estimation results in biased parameter estimates if these parameter estimates are obtained by a least squares fit of predicted to observed runoff values. The problems of error inflation and bias are examined in detail for a simple linear regression of runoff on rainfall and for a nonlinear U.S. Geological Survey precipitation-runoff model. Some implications for flood frequency analysis are considered. A case study using a set of data from Turtle Creek near Dallas, Texas illustrates the problems of model input errors.
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Weather Impact on Airport Arrival Meter Fix Throughput
NASA Technical Reports Server (NTRS)
Wang, Yao
2017-01-01
Time-based flow management provides arrival aircraft schedules based on arrival airport conditions, airport capacity, required spacing, and weather conditions. In order to meet a scheduled time at which arrival aircraft can cross an airport arrival meter fix prior to entering the airport terminal airspace, air traffic controllers make regulations on air traffic. Severe weather may create an airport arrival bottleneck if one or more of airport arrival meter fixes are partially or completely blocked by the weather and the arrival demand has not been reduced accordingly. Under these conditions, aircraft are frequently being put in holding patterns until they can be rerouted. A model that predicts the weather impacted meter fix throughput may help air traffic controllers direct arrival flows into the airport more efficiently, minimizing arrival meter fix congestion. This paper presents an analysis of air traffic flows across arrival meter fixes at the Newark Liberty International Airport (EWR). Several scenarios of weather impacted EWR arrival fix flows are described. Furthermore, multiple linear regression and regression tree ensemble learning approaches for translating multiple sector Weather Impacted Traffic Indexes (WITI) to EWR arrival meter fix throughputs are examined. These weather translation models are developed and validated using the EWR arrival flight and weather data for the period of April-September in 2014. This study also compares the performance of the regression tree ensemble with traditional multiple linear regression models for estimating the weather impacted throughputs at each of the EWR arrival meter fixes. For all meter fixes investigated, the results from the regression tree ensemble weather translation models show a stronger correlation between model outputs and observed meter fix throughputs than that produced from multiple linear regression method.
Almalik, Osama; Nijhuis, Michiel B; van den Heuvel, Edwin R
2014-01-01
Shelf-life estimation usually requires that at least three registration batches are tested for stability at multiple storage conditions. The shelf-life estimates are often obtained by linear regression analysis per storage condition, an approach implicitly suggested by ICH guideline Q1E. A linear regression analysis combining all data from multiple storage conditions was recently proposed in the literature when variances are homogeneous across storage conditions. The combined analysis is expected to perform better than the separate analysis per storage condition, since pooling data would lead to an improved estimate of the variation and higher numbers of degrees of freedom, but this is not evident for shelf-life estimation. Indeed, the two approaches treat the observed initial batch results, the intercepts in the model, and poolability of batches differently, which may eliminate or reduce the expected advantage of the combined approach with respect to the separate approach. Therefore, a simulation study was performed to compare the distribution of simulated shelf-life estimates on several characteristics between the two approaches and to quantify the difference in shelf-life estimates. In general, the combined statistical analysis does estimate the true shelf life more consistently and precisely than the analysis per storage condition, but it did not outperform the separate analysis in all circumstances.
Shen, Minxue; Tan, Hongzhuan; Zhou, Shujin; Retnakaran, Ravi; Smith, Graeme N.; Davidge, Sandra T.; Trasler, Jacquetta; Walker, Mark C.; Wen, Shi Wu
2016-01-01
Background It has been reported that higher folate intake from food and supplementation is associated with decreased blood pressure (BP). The association between serum folate concentration and BP has been examined in few studies. We aim to examine the association between serum folate and BP levels in a cohort of young Chinese women. Methods We used the baseline data from a pre-conception cohort of women of childbearing age in Liuyang, China, for this study. Demographic data were collected by structured interview. Serum folate concentration was measured by immunoassay, and homocysteine, blood glucose, triglyceride and total cholesterol were measured through standardized clinical procedures. Multiple linear regression and principal component regression model were applied in the analysis. Results A total of 1,532 healthy normotensive non-pregnant women were included in the final analysis. The mean concentration of serum folate was 7.5 ± 5.4 nmol/L and 55% of the women presented with folate deficiency (< 6.8 nmol/L). Multiple linear regression and principal component regression showed that serum folate levels were inversely associated with systolic and diastolic BP, after adjusting for demographic, anthropometric, and biochemical factors. Conclusions Serum folate is inversely associated with BP in non-pregnant women of childbearing age with high prevalence of folate deficiency. PMID:27182603
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
Chun, Hyonho; Keleş, Sündüz
2010-01-01
Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. PMID:20107611
Virtual Beach version 2.2 (VB 2.2) is a decision support tool. It is designed to construct site-specific Multi-Linear Regression (MLR) models to predict pathogen indicator levels (or fecal indicator bacteria, FIB) at recreational beaches. MLR analysis has outperformed persisten...
NASA Astrophysics Data System (ADS)
Prahutama, Alan; Suparti; Wahyu Utami, Tiani
2018-03-01
Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
2015-07-15
Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors
NASA Astrophysics Data System (ADS)
Jintao, Xue; Yufei, Liu; Liming, Ye; Chunyan, Li; Quanwei, Yang; Weiying, Wang; Yun, Jing; Minxiang, Zhang; Peng, Li
2018-01-01
Near-Infrared Spectroscopy (NIRS) was first used to develop a method for rapid and simultaneous determination of 5 active alkaloids (berberine, coptisine, palmatine, epiberberine and jatrorrhizine) in 4 parts (rhizome, fibrous root, stem and leaf) of Coptidis Rhizoma. A total of 100 samples from 4 main places of origin were collected and studied. With HPLC analysis values as calibration reference, the quantitative analysis of 5 marker components was performed by two different modeling methods, partial least-squares (PLS) regression as linear regression and artificial neural networks (ANN) as non-linear regression. The results indicated that the 2 types of models established were robust, accurate and repeatable for five active alkaloids, and the ANN models was more suitable for the determination of berberine, coptisine and palmatine while the PLS model was more suitable for the analysis of epiberberine and jatrorrhizine. The performance of the optimal models was achieved as follows: the correlation coefficient (R) for berberine, coptisine, palmatine, epiberberine and jatrorrhizine was 0.9958, 0.9956, 0.9959, 0.9963 and 0.9923, respectively; the root mean square error of validation (RMSEP) was 0.5093, 0.0578, 0.0443, 0.0563 and 0.0090, respectively. Furthermore, for the comprehensive exploitation and utilization of plant resource of Coptidis Rhizoma, the established NIR models were used to analysis the content of 5 active alkaloids in 4 parts of Coptidis Rhizoma and 4 main origin of places. This work demonstrated that NIRS may be a promising method as routine screening for off-line fast analysis or on-line quality assessment of traditional Chinese medicine (TCM).
ERIC Educational Resources Information Center
Li, Deping; Oranje, Andreas
2007-01-01
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Ernst, Anja F; Albers, Casper J
2017-01-01
Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking.
Ernst, Anja F.
2017-01-01
Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking. PMID:28533971
Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David
2017-10-01
Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P < .001). Means of software machine-derived values differed significantly from actual PLT yield, 4.72 × 10 11 vs.6.12 × 10 11 , respectively, (P < .001). The following equation was developed to adjust these values: actual PLT yield= 0.221 + (1.254 × theoretical platelet yield). ROC curve model showed an optimal apheresis device software prediction cut-off of 4.65 × 10 11 to obtain a DP, with a sensitivity of 82.2%, specificity of 93.3%, and an area under the curve (AUC) of 0.909. Trima Accel v6.0 software consistently underestimated PLT yields. Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Donroman, T.; Chesoh, S.; Lim, A.
2018-04-01
This study aimed to investigate the variation patterns of fish fingerling abundance based on month, year and sampling site. Monthly collecting data set of the Na Thap tidal river of southern Thailand, were obtained from June 2005 to October 2015. The square root transformation was employed for maintaining the fingerling data normality. Factor analysis was applied for clustering number of fingerling species and multiple linear regression was used to examine the association between fingerling density and year, month and site. Results from factor analysis classified fingerling into 3 factors based on saline preference; saline water, freshwater and ubiquitous species. The results showed a statistically high significant relation between fingerling density, month, year and site. Abundance of saline water and ubiquitous fingerling density showed similar pattern. Downstream site presented highest fingerling density whereas almost of freshwater fingerling occurred in upstream. This finding confirmed that factor analysis and the general linear regression method can be used as an effective tool for predicting and monitoring wild fingerling density in order to sustain fish stock management.
Huang, Li-Shan; Myers, Gary J.; Davidson, Philip W.; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W.; Cernichiari, Elsa; Shamlaye, Conrad F.; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W.
2007-01-01
Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age nine years. The analyses for the most recent nine-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated nonlinearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of twenty-one endpoints available at age nine years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other nine-year endpoints that in the linear analysis has a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels. PMID:17942158
Estimating linear temporal trends from aggregated environmental monitoring data
Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.
2017-01-01
Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.
Approximate Probabilistic Methods for Survivability/Vulnerability Analysis of Strategic Structures.
1978-07-15
weapon yield, in kilotons; K = energy coupling factor; C = coefficient determined from linear regression; a, b = exponents determined from linear...hn(l + .582 00 = 0.54 In the case of the applied pressure, according to Perret and Bass (1975), the variabilities in the exponents a and b of Eq. 32...ATTN: WESSF, L. Ingram ATTN: ATC-T ATTN: Library ATTN: F. Brown BMD Systems Command ATTN: J. Strange Deoartment of the Army ATTN: BMDSC-H, N. Hurst
Comparing The Effectiveness of a90/95 Calculations (Preprint)
2006-09-01
Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
Analysis of regression methods for solar activity forecasting
NASA Technical Reports Server (NTRS)
Lundquist, C. A.; Vaughan, W. W.
1979-01-01
The paper deals with the potential use of the most recent solar data to project trends in the next few years. Assuming that a mode of solar influence on weather can be identified, advantageous use of that knowledge presumably depends on estimating future solar activity. A frequently used technique for solar cycle predictions is a linear regression procedure along the lines formulated by McNish and Lincoln (1949). The paper presents a sensitivity analysis of the behavior of such regression methods relative to the following aspects: cycle minimum, time into cycle, composition of historical data base, and unnormalized vs. normalized solar cycle data. Comparative solar cycle forecasts for several past cycles are presented as to these aspects of the input data. Implications for the current cycle, No. 21, are also given.
Hao, Yong; Sun, Xu-Dong; Yang, Qiang
2012-12-01
Variables selection strategy combined with local linear embedding (LLE) was introduced for the analysis of complex samples by using near infrared spectroscopy (NIRS). Three methods include Monte Carlo uninformation variable elimination (MCUVE), successive projections algorithm (SPA) and MCUVE connected with SPA were used for eliminating redundancy spectral variables. Partial least squares regression (PLSR) and LLE-PLSR were used for modeling complex samples. The results shown that MCUVE can both extract effective informative variables and improve the precision of models. Compared with PLSR models, LLE-PLSR models can achieve more accurate analysis results. MCUVE combined with LLE-PLSR is an effective modeling method for NIRS quantitative analysis.
Predicting musically induced emotions from physiological inputs: linear and neural network models.
Russo, Frank A; Vempala, Naresh N; Sandstrom, Gillian M
2013-01-01
Listening to music often leads to physiological responses. Do these physiological responses contain sufficient information to infer emotion induced in the listener? The current study explores this question by attempting to predict judgments of "felt" emotion from physiological responses alone using linear and neural network models. We measured five channels of peripheral physiology from 20 participants-heart rate (HR), respiration, galvanic skin response, and activity in corrugator supercilii and zygomaticus major facial muscles. Using valence and arousal (VA) dimensions, participants rated their felt emotion after listening to each of 12 classical music excerpts. After extracting features from the five channels, we examined their correlation with VA ratings, and then performed multiple linear regression to see if a linear relationship between the physiological responses could account for the ratings. Although linear models predicted a significant amount of variance in arousal ratings, they were unable to do so with valence ratings. We then used a neural network to provide a non-linear account of the ratings. The network was trained on the mean ratings of eight of the 12 excerpts and tested on the remainder. Performance of the neural network confirms that physiological responses alone can be used to predict musically induced emotion. The non-linear model derived from the neural network was more accurate than linear models derived from multiple linear regression, particularly along the valence dimension. A secondary analysis allowed us to quantify the relative contributions of inputs to the non-linear model. The study represents a novel approach to understanding the complex relationship between physiological responses and musically induced emotion.
Li, Zhenghua; Cheng, Fansheng; Xia, Zhining
2011-01-01
The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-02-01
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
NASA Astrophysics Data System (ADS)
Hasan, Haliza; Ahmad, Sanizah; Osman, Balkish Mohd; Sapri, Shamsiah; Othman, Nadirah
2017-08-01
In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
2017-10-01
ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID PROPELLANT GRAIN GEOMETRIES Brian...author(s) and should not be construed as an official Department of the Army position, policy, or decision, unless so designated by other documentation...U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID
Discrimination and Acculturative Stress among First-Generation Dominicans
ERIC Educational Resources Information Center
Dawson, Beverly Araujo; Panchanadeswaran, Subadra
2010-01-01
The present study examined the relationship between discriminatory experiences and acculturative stress levels among a sample of 283 Dominican immigrants. Findings from a linear regression analysis revealed that experiences of daily racial discrimination and major racist events were significant predictors of acculturative stress after controlling…
Hobbs, Brian P.; Sargent, Daniel J.; Carlin, Bradley P.
2014-01-01
Assessing between-study variability in the context of conventional random-effects meta-analysis is notoriously difficult when incorporating data from only a small number of historical studies. In order to borrow strength, historical and current data are often assumed to be fully homogeneous, but this can have drastic consequences for power and Type I error if the historical information is biased. In this paper, we propose empirical and fully Bayesian modifications of the commensurate prior model (Hobbs et al., 2011) extending Pocock (1976), and evaluate their frequentist and Bayesian properties for incorporating patient-level historical data using general and generalized linear mixed regression models. Our proposed commensurate prior models lead to preposterior admissible estimators that facilitate alternative bias-variance trade-offs than those offered by pre-existing methodologies for incorporating historical data from a small number of historical studies. We also provide a sample analysis of a colon cancer trial comparing time-to-disease progression using a Weibull regression model. PMID:24795786
NASA Astrophysics Data System (ADS)
Hirst, Jonathan D.; King, Ross D.; Sternberg, Michael J. E.
1994-08-01
Neural networks and inductive logic programming (ILP) have been compared to linear regression for modelling the QSAR of the inhibition of E. coli dihydrofolate reductase (DHFR) by 2,4-diamino-5-(substitured benzyl)pyrimidines, and, in the subsequent paper [Hirst, J.D., King, R.D. and Sternberg, M.J.E., J. Comput.-Aided Mol. Design, 8 (1994) 421], the inhibition of rodent DHFR by 2,4-diamino-6,6-dimethyl-5-phenyl-dihydrotriazines. Cross-validation trials provide a statistically rigorous assessment of the predictive capabilities of the methods, with training and testing data selected randomly and all the methods developed using identical training data. For the ILP analysis, molecules are represented by attributes other than Hansch parameters. Neural networks and ILP perform better than linear regression using the attribute representation, but the difference is not statistically significant. The major benefit from the ILP analysis is the formulation of understandable rules relating the activity of the inhibitors to their chemical structure.
Analysis and generation of groundwater concentration time series
NASA Astrophysics Data System (ADS)
Crăciun, Maria; Vamoş, Călin; Suciu, Nicolae
2018-01-01
Concentration time series are provided by simulated concentrations of a nonreactive solute transported in groundwater, integrated over the transverse direction of a two-dimensional computational domain and recorded at the plume center of mass. The analysis of a statistical ensemble of time series reveals subtle features that are not captured by the first two moments which characterize the approximate Gaussian distribution of the two-dimensional concentration fields. The concentration time series exhibit a complex preasymptotic behavior driven by a nonstationary trend and correlated fluctuations with time-variable amplitude. Time series with almost the same statistics are generated by successively adding to a time-dependent trend a sum of linear regression terms, accounting for correlations between fluctuations around the trend and their increments in time, and terms of an amplitude modulated autoregressive noise of order one with time-varying parameter. The algorithm generalizes mixing models used in probability density function approaches. The well-known interaction by exchange with the mean mixing model is a special case consisting of a linear regression with constant coefficients.
Comparative analysis of used car price evaluation models
NASA Astrophysics Data System (ADS)
Chen, Chuancan; Hao, Lulu; Xu, Cong
2017-05-01
An accurate used car price evaluation is a catalyst for the healthy development of used car market. Data mining has been applied to predict used car price in several articles. However, little is studied on the comparison of using different algorithms in used car price estimation. This paper collects more than 100,000 used car dealing records throughout China to do empirical analysis on a thorough comparison of two algorithms: linear regression and random forest. These two algorithms are used to predict used car price in three different models: model for a certain car make, model for a certain car series and universal model. Results show that random forest has a stable but not ideal effect in price evaluation model for a certain car make, but it shows great advantage in the universal model compared with linear regression. This indicates that random forest is an optimal algorithm when handling complex models with a large number of variables and samples, yet it shows no obvious advantage when coping with simple models with less variables.
Statistical Tutorial | Center for Cancer Research
Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. ST is designed as a follow up to Statistical Analysis of Research Data (SARD) held in April 2018. The tutorial will apply the general principles of statistical analysis of research data including descriptive statistics, z- and t-tests of means and mean differences, simple and multiple linear regression, ANOVA tests, and Chi-Squared distribution.
ERIC Educational Resources Information Center
Beauchamp, Guy
2005-01-01
A study to present specific hypothesis that satisfactorily explain the boiling point of a number of molecules, CH[subscript w]F[subscript x]Cl[subscript y]Br[subscript z] having similar structure, and then analyze the model with the help of multiple linear regression (MLR), a data analysis tool. The MLR analysis was useful in selecting the…
Robbins, Blaine
2013-01-01
Sociologists, political scientists, and economists all suggest that culture plays a pivotal role in the development of large-scale cooperation. In this study, I used generalized trust as a measure of culture to explore if and how culture impacts intentional homicide, my operationalization of cooperation. I compiled multiple cross-national data sets and used pooled time-series linear regression, single-equation instrumental-variables linear regression, and fixed- and random-effects estimation techniques on an unbalanced panel of 118 countries and 232 observations spread over a 15-year time period. Results suggest that culture and large-scale cooperation form a tenuous relationship, while economic factors such as development, inequality, and geopolitics appear to drive large-scale cooperation. PMID:23527211
Structured functional additive regression in reproducing kernel Hilbert spaces.
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen
2014-06-01
Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application.
Missing-value estimation using linear and non-linear regression with Bayesian gene selection.
Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R
2003-11-22
Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).
Kim, J; Nagano, Y; Furumai, H
2012-01-01
Easy-to-measure surrogate parameters for water quality indicators are needed for real time monitoring as well as for generating data for model calibration and validation. In this study, a novel linear regression model for estimating total nitrogen (TN) based on two surrogate parameters is proposed based on evaluation of pollutant loads flowing into a eutrophic lake. Based on their runoff characteristics during wet weather, electric conductivity (EC) and turbidity were selected as surrogates for particulate nitrogen (PN) and dissolved nitrogen (DN), respectively. Strong linear relationships were established between PN and turbidity and DN and EC, and both models subsequently combined for estimation of TN. This model was evaluated by comparison of estimated and observed TN runoff loads during rainfall events. This analysis showed that turbidity and EC are viable surrogates for PN and DN, respectively, and that the linear regression model for TN concentration was successful in estimating TN runoff loads during rainfall events and also under dry weather conditions.
NASA Astrophysics Data System (ADS)
Naguib, Ibrahim A.; Darwish, Hany W.
2012-02-01
A comparison between support vector regression (SVR) and Artificial Neural Networks (ANNs) multivariate regression methods is established showing the underlying algorithm for each and making a comparison between them to indicate the inherent advantages and limitations. In this paper we compare SVR to ANN with and without variable selection procedure (genetic algorithm (GA)). To project the comparison in a sensible way, the methods are used for the stability indicating quantitative analysis of mixtures of mebeverine hydrochloride and sulpiride in binary mixtures as a case study in presence of their reported impurities and degradation products (summing up to 6 components) in raw materials and pharmaceutical dosage form via handling the UV spectral data. For proper analysis, a 6 factor 5 level experimental design was established resulting in a training set of 25 mixtures containing different ratios of the interfering species. An independent test set consisting of 5 mixtures was used to validate the prediction ability of the suggested models. The proposed methods (linear SVR (without GA) and linear GA-ANN) were successfully applied to the analysis of pharmaceutical tablets containing mebeverine hydrochloride and sulpiride mixtures. The results manifest the problem of nonlinearity and how models like the SVR and ANN can handle it. The methods indicate the ability of the mentioned multivariate calibration models to deconvolute the highly overlapped UV spectra of the 6 components' mixtures, yet using cheap and easy to handle instruments like the UV spectrophotometer.
Modeling vertebrate diversity in Oregon using satellite imagery
NASA Astrophysics Data System (ADS)
Cablk, Mary Elizabeth
Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
A Constrained Linear Estimator for Multiple Regression
ERIC Educational Resources Information Center
Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.
2010-01-01
"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
On the design of classifiers for crop inventories
NASA Technical Reports Server (NTRS)
Heydorn, R. P.; Takacs, H. C.
1986-01-01
Crop proportion estimators that use classifications of satellite data to correct, in an additive way, a given estimate acquired from ground observations are discussed. A linear version of these estimators is optimal, in terms of minimum variance, when the regression of the ground observations onto the satellite observations in linear. When this regression is not linear, but the reverse regression (satellite observations onto ground observations) is linear, the estimator is suboptimal but still has certain appealing variance properties. In this paper expressions are derived for those regressions which relate the intercepts and slopes to conditional classification probabilities. These expressions are then used to discuss the question of classifier designs that can lead to low-variance crop proportion estimates. Variance expressions for these estimates in terms of classifier omission and commission errors are also derived.
Albumin, a marker for post-operative myocardial damage in cardiac surgery.
van Beek, Dianne E C; van der Horst, Iwan C C; de Geus, A Fred; Mariani, Massimo A; Scheeren, Thomas W L
2018-06-06
Low serum albumin (SA) is a prognostic factor for poor outcome after cardiac surgery. The aim of this study was to estimate the association between pre-operative SA, early post-operative SA and postoperative myocardial injury. This single center cohort study included adult patients undergoing cardiac surgery during 4 consecutive years. Postoperative myocardial damage was defined by calculating the area under the curve (AUC) of troponin (Tn) values during the first 72 h after surgery and its association with SA analyzed using linear regression and with multivariable linear regression to account for patient related and procedural confounders. The association between SA and the secondary outcomes (peri-operative myocardial infarction [PMI], requiring ventilation >24 h, rhythm disturbances, 30-day mortality) was studied using (multivariable) log binomial regression analysis. In total 2757 patients were included. The mean pre-operative SA was 29 ± 13 g/l and the mean post-operative SA was 26 ± 6 g/l. Post-operative SA levels (on average 26 min after surgery) were inversely associated with postoperative myocardial damage in both univariable analysis (regression coefficient - 0.019, 95%CI -0.022/-0.015, p < 0.005) and after adjustment for patient related and surgical confounders (regression coefficient - 0.014 [95% CI -0.020/-0.008], p < 0.0005). Post-operative albumin levels were significantly correlated with the amount of postoperative myocardial damage in patients undergoing cardiac surgery independent of typical confounders. Copyright © 2018. Published by Elsevier Inc.
Arambasić, M B; Jatić-Slavković, D
2004-05-01
This paper presents the application of the regression analysis program and the program for comparing linear regressions (modified method for one-way, analysis of variance), writtens in BASIC program language, for instance, determination of content of Diclofenac-Sodium (active ingredient in DIKLOFEN injections, ampules á 75 mg/3 ml). Stability testing of Diclofenac-Sodium was done by isothermic method of accelerated aging at 4 different temperatures (30 degrees, 40 degrees, 50 degrees and 60 degrees C) as a function of time (4 different duration of treatment: (0-155, 0-145, 0-74 and 0-44 days). The decrease in stability (decrease in the mean value of the content of Diclofenac-Sodium (in %), at different temperatures as a function of time, is possible to describe by, linear dependance. According to the value for regression equation values, the times are assessed in which the content of Diclofenac-Sodium (in %) will decrease by 10%, of the initial value. The times are follows at 30 degrees C 761.02 days, at 40 degrees C 397.26 days, at 50 degrees C 201.96 days and at 60 degrees C 58.85 days. The estimated times (in days) in which the mean value for Diclofenac-Sodium content (in %) will by 10% of the initial values, as a junction of time, are most suitably described by 3rd order parabola. Based on the parameter values which describe the 3rd order parabola, the time was estimated in which Diclofenac-Sodium content mean value (in %) will fall by 10% of the initial one at average ambient temperatures of 20 degrees C and 25 degrees C. The times are: 1409.47 days (20 degrees C) and 1042.39 days (25 degrees C). Based on the value for Fischer's coefficien (F), the comparison of trenf of Diclofenac-Sodium content (in %) shows that, under the influence of different temperatures as a function of time, among them, depending on temperature value, there is: statistically very significant difference (P < .05) at 50 degrees C and lower toward 60 degrees C, i.e. statistically probably significant difference (P > 0.01) at 40 degrees C and lower towards 50 degrees C and there is no statistically significance difference (P > 0.05) at 30 degrees C towards 40 degrees C.
Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo
2011-03-04
Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. Copyright © 2010 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
NASA Astrophysics Data System (ADS)
Karan, S.; Sebok, E.; Engesgaard, P. K.
2016-12-01
For identifying groundwater seepage locations in small streams within a headwater catchment, we present a method expanding on the linear regression of air and stream temperatures. Thus, by measuring the temperatures in dual-depth; in the stream column and at the streambed-water interface (SWI), we apply metrics from linear regression analysis of temperatures between air/stream and air/SWI (linear regression slope, intercept and coefficient of determination), and the daily mean temperatures (temperature variance and the average difference between the minimum and maximum daily temperatures). Our study show that using metrics from single-depth stream temperature measurements only are not sufficient to identify substantial groundwater seepage locations within a headwater stream. Conversely, comparing the metrics from dual-depth temperatures show significant differences so that at groundwater seepage locations, temperatures at the SWI, merely explain 43-75 % of the variation opposed to ≥91 % at the corresponding stream column temperatures. The figure showing a box-plot of the variation in daily mean temperature depict that at several locations there is great variation in the range the upper and lower loggers due to groundwater seepage. In general, the linear regression show that at these locations at the SWI, the slopes (<0.25) and intercepts (>6.5oC) are substantially lower and higher, while the mean diel amplitudes (<0.98oC) are decreased compared to remaining locations. The dual-depth approach was applied in a post-glacial fluvial setting, where metrics analyses overall corresponded to field measurements of groundwater fluxes deduced from vertical streambed temperatures and stream flow accretions. Thus, we propose a method reliably identifying groundwater seepage locations along streambed in such settings.
Billard, Hélène; Simon, Laure; Desnots, Emmanuelle; Sochard, Agnès; Boscher, Cécile; Riaublanc, Alain; Alexandre-Gouabau, Marie-Cécile; Boquien, Clair-Yves
2016-08-01
Human milk composition analysis seems essential to adapt human milk fortification for preterm neonates. The Miris human milk analyzer (HMA), based on mid-infrared methodology, is convenient for a unique determination of macronutrients. However, HMA measurements are not totally comparable with reference methods (RMs). The primary aim of this study was to compare HMA results with results from biochemical RMs for a large range of protein, fat, and carbohydrate contents and to establish a calibration adjustment. Human milk was fractionated in protein, fat, and skim milk by covering large ranges of protein (0-3 g/100 mL), fat (0-8 g/100 mL), and carbohydrate (5-8 g/100 mL). For each macronutrient, a calibration curve was plotted by linear regression using measurements obtained using HMA and RMs. For fat, 53 measurements were performed, and the linear regression equation was HMA = 0.79RM + 0.28 (R(2) = 0.92). For true protein (29 measurements), the linear regression equation was HMA = 0.9RM + 0.23 (R(2) = 0.98). For carbohydrate (15 measurements), the linear regression equation was HMA = 0.59RM + 1.86 (R(2) = 0.95). A homogenization step with a disruptor coupled to a sonication step was necessary to obtain better accuracy of the measurements. Good repeatability (coefficient of variation < 7%) and reproducibility (coefficient of variation < 17%) were obtained after calibration adjustment. New calibration curves were developed for the Miris HMA, allowing accurate measurements in large ranges of macronutrient content. This is necessary for reliable use of this device in individualizing nutrition for preterm newborns. © The Author(s) 2015.
Pattern Recognition Analysis of Age-Related Retinal Ganglion Cell Signatures in the Human Eye
Yoshioka, Nayuta; Zangerl, Barbara; Nivison-Smith, Lisa; Khuu, Sieu K.; Jones, Bryan W.; Pfeiffer, Rebecca L.; Marc, Robert E.; Kalloniatis, Michael
2017-01-01
Purpose To characterize macular ganglion cell layer (GCL) changes with age and provide a framework to assess changes in ocular disease. This study used data clustering to analyze macular GCL patterns from optical coherence tomography (OCT) in a large cohort of subjects without ocular disease. Methods Single eyes of 201 patients evaluated at the Centre for Eye Health (Sydney, Australia) were retrospectively enrolled (age range, 20–85); 8 × 8 grid locations obtained from Spectralis OCT macular scans were analyzed with unsupervised classification into statistically separable classes sharing common GCL thickness and change with age. The resulting classes and gridwise data were fitted with linear and segmented linear regression curves. Additionally, normalized data were analyzed to determine regression as a percentage. Accuracy of each model was examined through comparison of predicted 50-year-old equivalent macular GCL thickness for the entire cohort to a true 50-year-old reference cohort. Results Pattern recognition clustered GCL thickness across the macula into five to eight spatially concentric classes. F-test demonstrated segmented linear regression to be the most appropriate model for macular GCL change. The pattern recognition–derived and normalized model revealed less difference between the predicted macular GCL thickness and the reference cohort (average ± SD 0.19 ± 0.92 and −0.30 ± 0.61 μm) than a gridwise model (average ± SD 0.62 ± 1.43 μm). Conclusions Pattern recognition successfully identified statistically separable macular areas that undergo a segmented linear reduction with age. This regression model better predicted macular GCL thickness. The various unique spatial patterns revealed by pattern recognition combined with core GCL thickness data provide a framework to analyze GCL loss in ocular disease. PMID:28632847
Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H
2018-01-01
Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P < 0.01), and correlated significantly with AF traits (P < 0.05). The best multiple linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Hayes, Andrew F; Rockwood, Nicholas J
2017-11-01
There have been numerous treatments in the clinical research literature about various design, analysis, and interpretation considerations when testing hypotheses about mechanisms and contingencies of effects, popularly known as mediation and moderation analysis. In this paper we address the practice of mediation and moderation analysis using linear regression in the pages of Behaviour Research and Therapy and offer some observations and recommendations, debunk some popular myths, describe some new advances, and provide an example of mediation, moderation, and their integration as conditional process analysis using the PROCESS macro for SPSS and SAS. Our goal is to nudge clinical researchers away from historically significant but increasingly old school approaches toward modifications, revisions, and extensions that characterize more modern thinking about the analysis of the mechanisms and contingencies of effects. Copyright © 2016 Elsevier Ltd. All rights reserved.
Balabin, Roman M; Smirnov, Sergey V
2011-04-29
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques application, such as Raman, ultraviolet-visible (UV-vis), or nuclear magnetic resonance (NMR) spectroscopies, can be greatly improved by an appropriate feature selection choice. Copyright © 2011 Elsevier B.V. All rights reserved.
Does Group-Level Commitment Predict Employee Well-Being?: A Prospective Analysis.
Clausen, Thomas; Christensen, Karl Bang; Nielsen, Karina
2015-11-01
To investigate the links between group-level affective organizational commitment (AOC) and individual-level psychological well-being, self-reported sickness absence, and sleep disturbances. A total of 5085 care workers from 301 workgroups in the Danish eldercare services participated in both waves of the study (T1 [2005] and T2 [2006]). The three outcomes were analyzed using linear multilevel regression analysis, multilevel Poisson regression analysis, and multilevel logistic regression analysis, respectively. Group-level AOC (T1) significantly predicted individual-level psychological well-being, self-reported sickness absence, and sleep disturbances (T2). The association between group-level AOC (T1) and psychological well-being (T2) was fully mediated by individual-level AOC (T1), and the associations between group-level AOC (T1) and self-reported sickness absence and sleep disturbances (T2) were partially mediated by individual-level AOC (T1). Group-level AOC is an important predictor of employee well-being in contemporary health care organizations.
Bone Mass in Boys with Autism Spectrum Disorder
ERIC Educational Resources Information Center
Calarge, Chadi A.; Schlechte, Janet A.
2017-01-01
To examine bone mass in children and adolescents with autism spectrum disorders (ASD). Risperidone-treated 5 to 17 year-old males underwent anthropometric and bone measurements, using dual-energy X-ray absorptiometry and peripheral quantitative computed tomography. Multivariable linear regression analysis models examined whether skeletal outcomes…
Behavioral Cues in the Judgment of Marital Satisfaction: A Linear Regression Analysis
ERIC Educational Resources Information Center
Royce, W. Stephen; Weiss, Robert L.
1975-01-01
Forty undergraduate judges watched videotaped interactions of couples and rated their marital satisfaction based on certain behavioral cues. Results indicate: untrained judges were able to discriminate marital satisfaction/distress with significant validity; judges' ratings were correlated with couples' aversive behavior; and the actuarial…
A Practical Model for Forecasting New Freshman Enrollment during the Application Period.
ERIC Educational Resources Information Center
Paulsen, Michael B.
1989-01-01
A simple and effective model for forecasting freshman enrollment during the application period is presented step by step. The model requires minimal and readily available information, uses a simple linear regression analysis on a personal computer, and provides updated monthly forecasts. (MSE)
ASURV: Astronomical SURVival Statistics
NASA Astrophysics Data System (ADS)
Feigelson, E. D.; Nelson, P. I.; Isobe, T.; LaValley, M.
2014-06-01
ASURV (Astronomical SURVival Statistics) provides astronomy survival analysis for right- and left-censored data including the maximum-likelihood Kaplan-Meier estimator and several univariate two-sample tests, bivariate correlation measures, and linear regressions. ASURV is written in FORTRAN 77, and is stand-alone and does not call any specialized libraries.
Testing hypotheses for differences between linear regression lines
Stanley J. Zarnoch
2009-01-01
Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
Graphical Description of Johnson-Neyman Outcomes for Linear and Quadratic Regression Surfaces.
ERIC Educational Resources Information Center
Schafer, William D.; Wang, Yuh-Yin
A modification of the usual graphical representation of heterogeneous regressions is described that can aid in interpreting significant regions for linear or quadratic surfaces. The standard Johnson-Neyman graph is a bivariate plot with the criterion variable on the ordinate and the predictor variable on the abscissa. Regression surfaces are drawn…
Estimating monotonic rates from biological data using local linear regression.
Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R
2017-03-01
Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.
Locally linear regression for pose-invariant face recognition.
Chai, Xiujuan; Shan, Shiguang; Chen, Xilin; Gao, Wen
2007-07-01
The variation of facial appearance due to the viewpoint (/pose) degrades face recognition systems considerably, which is one of the bottlenecks in face recognition. One of the possible solutions is generating virtual frontal view from any given nonfrontal view to obtain a virtual gallery/probe face. Following this idea, this paper proposes a simple, but efficient, novel locally linear regression (LLR) method, which generates the virtual frontal view from a given nonfrontal face image. We first justify the basic assumption of the paper that there exists an approximate linear mapping between a nonfrontal face image and its frontal counterpart. Then, by formulating the estimation of the linear mapping as a prediction problem, we present the regression-based solution, i.e., globally linear regression. To improve the prediction accuracy in the case of coarse alignment, LLR is further proposed. In LLR, we first perform dense sampling in the nonfrontal face image to obtain many overlapped local patches. Then, the linear regression technique is applied to each small patch for the prediction of its virtual frontal patch. Through the combination of all these patches, the virtual frontal view is generated. The experimental results on the CMU PIE database show distinct advantage of the proposed method over Eigen light-field method.
A pocket-sized metabolic analyzer for assessment of resting energy expenditure.
Zhao, Di; Xian, Xiaojun; Terrera, Mirna; Krishnan, Ranganath; Miller, Dylan; Bridgeman, Devon; Tao, Kevin; Zhang, Lihua; Tsow, Francis; Forzani, Erica S; Tao, Nongjian
2014-04-01
The assessment of metabolic parameters related to energy expenditure has a proven value for weight management; however these measurements remain too difficult and costly for monitoring individuals at home. The objective of this study is to evaluate the accuracy of a new pocket-sized metabolic analyzer device for assessing energy expenditure at rest (REE) and during sedentary activities (EE). The new device performs indirect calorimetry by measuring an individual's oxygen consumption (VO2) and carbon dioxide production (VCO2) rates, which allows the determination of resting- and sedentary activity-related energy expenditure. VO2 and VCO2 values of 17 volunteer adult subjects were measured during resting and sedentary activities in order to compare the metabolic analyzer with the Douglas bag method. The Douglas bag method is considered the Gold Standard method for indirect calorimetry. Metabolic parameters of VO2, VCO2, and energy expenditure were compared using linear regression analysis, paired t-tests, and Bland-Altman plots. Linear regression analysis of measured VO2 and VCO2 values, as well as calculated energy expenditure assessed with the new analyzer and Douglas bag method, had the following linear regression parameters (linear regression slope LRS0, and R-squared coefficient, r(2)) with p = 0: LRS0 (SD) = 1.00 (0.01), r(2) = 0.9933 for VO2; LRS0 (SD) = 1.00 (0.01), r(2) = 0.9929 for VCO2; and LRS0 (SD) = 1.00 (0.01), r(2) = 0.9942 for energy expenditure. In addition, results from paired t-tests did not show statistical significant difference between the methods with a significance level of α = 0.05 for VO2, VCO2, REE, and EE. Furthermore, the Bland-Altman plot for REE showed good agreement between methods with 100% of the results within ±2SD, which was equivalent to ≤10% error. The findings demonstrate that the new pocket-sized metabolic analyzer device is accurate for determining VO2, VCO2, and energy expenditure. Copyright © 2013 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.
Huppert, Theodore J
2016-01-01
Functional near-infrared spectroscopy (fNIRS) is a noninvasive neuroimaging technique that uses low levels of light to measure changes in cerebral blood oxygenation levels. In the majority of NIRS functional brain studies, analysis of this data is based on a statistical comparison of hemodynamic levels between a baseline and task or between multiple task conditions by means of a linear regression model: the so-called general linear model. Although these methods are similar to their implementation in other fields, particularly for functional magnetic resonance imaging, the specific application of these methods in fNIRS research differs in several key ways related to the sources of noise and artifacts unique to fNIRS. In this brief communication, we discuss the application of linear regression models in fNIRS and the modifications needed to generalize these models in order to deal with structured (colored) noise due to systemic physiology and noise heteroscedasticity due to motion artifacts. The objective of this work is to present an overview of these noise properties in the context of the linear model as it applies to fNIRS data. This work is aimed at explaining these mathematical issues to the general fNIRS experimental researcher but is not intended to be a complete mathematical treatment of these concepts.
Sinha, Nikita; Reddy, K Mahendranadh; Gupta, Nidhi; Shastry, Y M
2017-01-01
Occlusal plane (OP) differs considerably in participants with skeletal Class I and Class II participants. In this study, cephalometrics has been used to help in the determination of orientation of the OP utilizing the nonresorbable bony anatomic landmarks in skeletal Class II participants and an attempt has been made to predict and examine the OP in individuals with skeletal class II jaw relationship. One hundred dentulous participants with skeletal Class II malocclusion who came to the hospital for correcting their jaw relationship participated in the study. Their right lateral cephalogram was taken using standardized procedures, and all the tracings were manually done by a single trained examiner. The cephalograms which were taken for the diagnostic purpose were utilized for the study, and the patient was not exposed to any unnecessary radiation. The numerical values obtained from the cephalograms were subjected to statistical analysis. Pearson's correlation of <0.001 was considered significant, and a linear regression analysis was performed to determine a formula which would help in the determination of orientation of the OP in Class II edentulous participants. Pearson's correlation coefficient and linear regression analysis were performed, and a high correlation was found between A2 and (A2 + B2)/(B2 + C2) with " r " value of 0.5. A medium correlation was found between D2 and (D2 + E2)/(E2 + F2) with " r " value of 0.42. The formula obtained for posterior reference frame through linear regression equation was y = 0.018* × +0.459 and the formula obtained for anterior reference frame was y1 = 0.011* × 1 + 0.497. It was hypothesized that by substituting these formulae in the cephalogram obtained from the Class II edentate individual, the OP can be obtained and verified. It was concluded that cephalometrics can be useful in examining the orientation of OP in skeletal Class II participants.
Tu, Yu-Kang; Krämer, Nicole; Lee, Wen-Chung
2012-07-01
In the analysis of trends in health outcomes, an ongoing issue is how to separate and estimate the effects of age, period, and cohort. As these 3 variables are perfectly collinear by definition, regression coefficients in a general linear model are not unique. In this tutorial, we review why identification is a problem, and how this problem may be tackled using partial least squares and principal components regression analyses. Both methods produce regression coefficients that fulfill the same collinearity constraint as the variables age, period, and cohort. We show that, because the constraint imposed by partial least squares and principal components regression is inherent in the mathematical relation among the 3 variables, this leads to more interpretable results. We use one dataset from a Taiwanese health-screening program to illustrate how to use partial least squares regression to analyze the trends in body heights with 3 continuous variables for age, period, and cohort. We then use another dataset of hepatocellular carcinoma mortality rates for Taiwanese men to illustrate how to use partial least squares regression to analyze tables with aggregated data. We use the second dataset to show the relation between the intrinsic estimator, a recently proposed method for the age-period-cohort analysis, and partial least squares regression. We also show that the inclusion of all indicator variables provides a more consistent approach. R code for our analyses is provided in the eAppendix.
Meta-regression analysis of the effect of trans fatty acids on low-density lipoprotein cholesterol.
Allen, Bruce C; Vincent, Melissa J; Liska, DeAnn; Haber, Lynne T
2016-12-01
We conducted a meta-regression of controlled clinical trial data to investigate quantitatively the relationship between dietary intake of industrial trans fatty acids (iTFA) and increased low-density lipoprotein cholesterol (LDL-C). Previous regression analyses included insufficient data to determine the nature of the dose response in the low-dose region and have nonetheless assumed a linear relationship between iTFA intake and LDL-C levels. This work contributes to the previous work by 1) including additional studies examining low-dose intake (identified using an evidence mapping procedure); 2) investigating a range of curve shapes, including both linear and nonlinear models; and 3) using Bayesian meta-regression to combine results across trials. We found that, contrary to previous assumptions, the linear model does not acceptably fit the data, while the nonlinear, S-shaped Hill model fits the data well. Based on a conservative estimate of the degree of intra-individual variability in LDL-C (0.1 mmoL/L), as an estimate of a change in LDL-C that is not adverse, a change in iTFA intake of 2.2% of energy intake (%en) (corresponding to a total iTFA intake of 2.2-2.9%en) does not cause adverse effects on LDL-C. The iTFA intake associated with this change in LDL-C is substantially higher than the average iTFA intake (0.5%en). Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Ruuska, Salla; Hämäläinen, Wilhelmiina; Kajava, Sari; Mughal, Mikaela; Matilainen, Pekka; Mononen, Jaakko
2018-03-01
The aim of the present study was to evaluate empirically confusion matrices in device validation. We compared the confusion matrix method to linear regression and error indices in the validation of a device measuring feeding behaviour of dairy cattle. In addition, we studied how to extract additional information on classification errors with confusion probabilities. The data consisted of 12 h behaviour measurements from five dairy cows; feeding and other behaviour were detected simultaneously with a device and from video recordings. The resulting 216 000 pairs of classifications were used to construct confusion matrices and calculate performance measures. In addition, hourly durations of each behaviour were calculated and the accuracy of measurements was evaluated with linear regression and error indices. All three validation methods agreed when the behaviour was detected very accurately or inaccurately. Otherwise, in the intermediate cases, the confusion matrix method and error indices produced relatively concordant results, but the linear regression method often disagreed with them. Our study supports the use of confusion matrix analysis in validation since it is robust to any data distribution and type of relationship, it makes a stringent evaluation of validity, and it offers extra information on the type and sources of errors. Copyright © 2018 Elsevier B.V. All rights reserved.
Miele, Andrew; Thompson, Morgan; Jao, Nancy C; Kalhan, Ravi; Leone, Frank; Hogarth, Lee; Hitsman, Brian; Schnoll, Robert
2018-01-01
A substantial proportion of cancer patients continue to smoke after their diagnosis but few studies have evaluated correlates of nicotine dependence and smoking rate in this population, which could help guide smoking cessation interventions. This study evaluated correlates of smoking rate and nicotine dependence among 207 cancer patients. A cross-sectional analysis using multiple linear regression evaluated disease, demographic, affective, and tobacco-seeking correlates of smoking rate and nicotine dependence. Smoking rate was assessed using a timeline follow-back method. The Fagerström Test for Nicotine Dependence measured levels of nicotine dependence. A multiple linear regression predicting nicotine dependence showed an association with smoking to alleviate a sense of addiction from the Reasons for Smoking scale and tobacco-seeking behavior from the concurrent choice task ( p < .05), but not with affect measured by the HADS and PANAS ( p > .05). Multiple linear regression predicting prequit showed an association with smoking to alleviate addiction ( p < .05). ANOVA showed that Caucasian participants reported greater rates of smoking compared to other races. The results suggest that behavioral smoking cessation interventions that focus on helping patients to manage tobacco-seeking behavior, rather than mood management interventions, could help cancer patients quit smoking.
Tsai, Hsin-Jung; Kuo, Terry B J; Lin, Yu-Cheng; Yang, Cheryl C H
2015-12-30
A blunting of heart rate (HR) reduction during sleep has been reported to be associated with increased all-cause mortality. An increased incident of cardiovascular events has been observed in patients with insomnia but the relationship between nighttime HR and insomnia remains unclear. Here we investigated the HR patterns during the sleep onset period and its association with the length of sleep onset latency (SOL). Nineteen sleep-onset insomniacs (SOI) and 14 good sleepers had their sleep analyzed. Linear regression and nonlinear Hilbert-Huang transform (HHT) of the HR slope were performed in order to analyze HR dynamics during the sleep onset period. A significant depression in HR fluctuation was identified among the SOI group during the sleep onset period when linear regression and HHT analysis were applied. The magnitude of the HR reduction was associated with both polysomnography-defined and subjective SOL; moreover, we found that the linear regression and HHT slopes of the HR showed great sensitivity with respect to sleep quality. Our findings indicate that HR dynamics during the sleep onset period are sensitive to sleep initiation difficulty and respond to the SOL, which indicates that the presence of autonomic dysfunction would seem to affect the progress of falling asleep. Copyright © 2015. Published by Elsevier Ireland Ltd.
Solar energy distribution over Egypt using cloudiness from Meteosat photos
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mosalam Shaltout, M.A.; Hassen, A.H.
1990-01-01
In Egypt, there are 10 ground stations for measuring the global solar radiation, and five stations for measuring the diffuse solar radiation. Every day at noon, the Meteorological Authority in Cairo receives three photographs of cloudiness over Egypt from the Meteosat satellite, one in the visible, and two in the infra-red bands (10.5-12.5 {mu}m) and (5.7-7.1 {mu}m). The monthly average cloudiness for 24 sites over Egypt are measured and calculated from Meteosat observations during the period 1985-1986. Correlation analysis between the cloudiness observed by Meteosat and global solar radiation measured from the ground stations is carried out. It is foundmore » that, the correlation coefficients are about 0.90 for the simple linear regression, and increase for the second and third degree regressions. Also, the correlation coefficients for the cloudiness with the diffuse solar radiation are about 0.80 for the simple linear regression, and increase for the second and third degree regression. Models and empirical relations for estimating the global and diffuse solar radiation from Meteosat cloudiness data over Egypt are deduced and tested. Seasonal maps for the global and diffuse radiation over Egypt are carried out.« less
Efficient least angle regression for identification of linear-in-the-parameters models
Beach, Thomas H.; Rezgui, Yacine
2017-01-01
Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States
NASA Astrophysics Data System (ADS)
Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin
2016-03-01
From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.
A Study of the Effect of the Front-End Styling of Sport Utility Vehicles on Pedestrian Head Injuries
Qin, Qin; Chen, Zheng; Bai, Zhonghao; Cao, Libo
2018-01-01
Background The number of sport utility vehicles (SUVs) on China market is continuously increasing. It is necessary to investigate the relationships between the front-end styling features of SUVs and head injuries at the styling design stage for improving the pedestrian protection performance and product development efficiency. Methods Styling feature parameters were extracted from the SUV side contour line. And simplified finite element models were established based on the 78 SUV side contour lines. Pedestrian headform impact simulations were performed and validated. The head injury criterion of 15 ms (HIC15) at four wrap-around distances was obtained. A multiple linear regression analysis method was employed to describe the relationships between the styling feature parameters and the HIC15 at each impact point. Results The relationship between the selected styling features and the HIC15 showed reasonable correlations, and the regression models and the selected independent variables showed statistical significance. Conclusions The regression equations obtained by multiple linear regression can be used to assess the performance of SUV styling in protecting pedestrians' heads and provide styling designers with technical guidance regarding their artistic creations.
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States
Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin
2016-01-01
From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states. PMID:26996254
Explicit criteria for prioritization of cataract surgery
Ma Quintana, José; Escobar, Antonio; Bilbao, Amaia
2006-01-01
Background Consensus techniques have been used previously to create explicit criteria to prioritize cataract extraction; however, the appropriateness of the intervention was not included explicitly in previous studies. We developed a prioritization tool for cataract extraction according to the RAND method. Methods Criteria were developed using a modified Delphi panel judgment process. A panel of 11 ophthalmologists was assembled. Ratings were analyzed regarding the level of agreement among panelists. We studied the effect of all variables on the final panel score using general linear and logistic regression models. Priority scoring systems were developed by means of optimal scaling and general linear models. The explicit criteria developed were summarized by means of regression tree analysis. Results Eight variables were considered to create the indications. Of the 310 indications that the panel evaluated, 22.6% were considered high priority, 52.3% intermediate priority, and 25.2% low priority. Agreement was reached for 31.9% of the indications and disagreement for 0.3%. Logistic regression and general linear models showed that the preoperative visual acuity of the cataractous eye, visual function, and anticipated visual acuity postoperatively were the most influential variables. Alternative and simple scoring systems were obtained by optimal scaling and general linear models where the previous variables were also the most important. The decision tree also shows the importance of the previous variables and the appropriateness of the intervention. Conclusion Our results showed acceptable validity as an evaluation and management tool for prioritizing cataract extraction. It also provides easy algorithms for use in clinical practice. PMID:16512893
Huff, Andrew G; Hodges, James S; Kennedy, Shaun P; Kircher, Amy
2015-08-01
To protect and secure food resources for the United States, it is crucial to have a method to compare food systems' criticality. In 2007, the U.S. government funded development of the Food and Agriculture Sector Criticality Assessment Tool (FASCAT) to determine which food and agriculture systems were most critical to the nation. FASCAT was developed in a collaborative process involving government officials and food industry subject matter experts (SMEs). After development, data were collected using FASCAT to quantify threats, vulnerabilities, consequences, and the impacts on the United States from failure of evaluated food and agriculture systems. To examine FASCAT's utility, linear regression models were used to determine: (1) which groups of questions posed in FASCAT were better predictors of cumulative criticality scores; (2) whether the items included in FASCAT's criticality method or the smaller subset of FASCAT items included in DHS's risk analysis method predicted similar criticality scores. Akaike's information criterion was used to determine which regression models best described criticality, and a mixed linear model was used to shrink estimates of criticality for individual food and agriculture systems. The results indicated that: (1) some of the questions used in FASCAT strongly predicted food or agriculture system criticality; (2) the FASCAT criticality formula was a stronger predictor of criticality compared to the DHS risk formula; (3) the cumulative criticality formula predicted criticality more strongly than weighted criticality formula; and (4) the mixed linear regression model did not change the rank-order of food and agriculture system criticality to a large degree. © 2015 Society for Risk Analysis.
Impact of divorce on the quality of life in school-age children.
Eymann, Alfredo; Busaniche, Julio; Llera, Julián; De Cunto, Carmen; Wahren, Carlos
2009-01-01
To assess psychosocial quality of life in school-age children of divorced parents. A cross-sectional survey was conducted at the pediatric outpatient clinic of a community hospital. Children 5 to 12 years old from married families and divorced families were included. Child quality of life was assessed through maternal reports using a Child Health Questionnaire-Parent Form 50. A multiple linear regression model was constructed including clinically relevant variables significant on univariate analysis (beta coefficient and 95%CI). Three hundred and thirty families were invited to participate and 313 completed the questionnaire. Univariate analysis showed that quality of life was significantly associated with parental separation, child sex, time spent with the father, standard of living, and maternal education. In a multiple linear regression model, quality of life scores decreased in boys -4.5 (-6.8 to -2.3) and increased for time spent with the father 0.09 (0.01 to 0.2). In divorced families, multiple linear regression showed that quality of life scores increased when parents had separated by mutual agreement 6.1 (2.7 to 9.4), when the mother had university level education 5.9 (1.7 to 10.1) and for each year elapsed since separation 0.6 (0.2 to 1.1), whereas scores decreased in boys -5.4 (-9.5 to -1.3) and for each one-year increment of maternal age -0.4 (-0.7 to -0.05). Children's psychosocial quality of life was affected by divorce. The Child Health Questionnaire can be useful to detect a decline in the psychosocial quality of life.
Ramasubramanian, Viswanathan; Glasser, Adrian
2015-01-01
PURPOSE To determine whether relatively low-resolution ultrasound biomicroscopy (UBM) can predict the accommodative optical response in prepresbyopic eyes as well as in a previous study of young phakic subjects, despite lower accommodative amplitudes. SETTING College of Optometry, University of Houston, Houston, USA. DESIGN Observational cross-sectional study. METHODS Static accommodative optical response was measured with infrared photorefraction and an autorefractor (WR-5100K) in subjects aged 36 to 46 years. A 35 MHz UBM device (Vumax, Sonomed Escalon) was used to image the left eye, while the right eye viewed accommodative stimuli. Custom-developed Matlab image-analysis software was used to perform automated analysis of UBM images to measure the ocular biometry parameters. The accommodative optical response was predicted from biometry parameters using linear regression, 95% confidence intervals (CIs), and 95% prediction intervals. RESULTS The study evaluated 25 subjects. Per-diopter (D) accommodative changes in anterior chamber depth (ACD), lens thickness, anterior and posterior lens radii of curvature, and anterior segment length were similar to previous values from young subjects. The standard deviations (SDs) of accommodative optical response predicted from linear regressions for UBM-measured biometry parameters were ACD, 0.15 D; lens thickness, 0.25 D; anterior lens radii of curvature, 0.09 D; posterior lens radii of curvature, 0.37 D; and anterior segment length, 0.42 D. CONCLUSIONS Ultrasound biomicroscopy parameters can, on average, predict accommodative optical response with SDs of less than 0.55 D using linear regressions and 95% CIs. Ultrasound biomicroscopy can be used to visualize and quantify accommodative biometric changes and predict accommodative optical response in prepresbyopic eyes. PMID:26049831
Sulfur Mustard Induces Apoptosis in Lung Epithelial Cells via a Caspase Amplification Loop
2010-01-01
analysis using antibodies specific for exe- cutioner caspase-3. The positions of the immunoreactive proteins are indicated. Results shown are representative...respectively. The emission at 460nm from each sample was plotted against time, and linear regression analysis was used to determine the initial veloc- ity...follows, **pɘ.01, ***pɘ.001. .4. Immunoblot analysis SDS-PAGE and transfer of separated proteins to nitrocellulosemembranes were erformed according to
Li, Liang; Wang, Yiying; Xu, Jiting; Flora, Joseph R V; Hoque, Shamia; Berge, Nicole D
2018-08-01
Hydrothermal carbonization (HTC) is a wet, low temperature thermal conversion process that continues to gain attention for the generation of hydrochar. The importance of specific process conditions and feedstock properties on hydrochar characteristics is not well understood. To evaluate this, linear and non-linear models were developed to describe hydrochar characteristics based on data collected from HTC-related literature. A Sobol analysis was subsequently conducted to identify parameters that most influence hydrochar characteristics. Results from this analysis indicate that for each investigated hydrochar property, the model fit and predictive capability associated with the random forest models is superior to both the linear and regression tree models. Based on results from the Sobol analysis, the feedstock properties and process conditions most influential on hydrochar yield, carbon content, and energy content were identified. In addition, a variational process parameter sensitivity analysis was conducted to determine how feedstock property importance changes with process conditions. Copyright © 2018 Elsevier Ltd. All rights reserved.
Effect of Malmquist bias on correlation studies with IRAS data base
NASA Technical Reports Server (NTRS)
Verter, Frances
1993-01-01
The relationships between galaxy properties in the sample of Trinchieri et al. (1989) are reexamined with corrections for Malmquist bias. The linear correlations are tested and linear regressions are fit for log-log plots of L(FIR), L(H-alpha), and L(B) as well as ratios of these quantities. The linear correlations for Malmquist bias are corrected using the method of Verter (1988), in which each galaxy observation is weighted by the inverse of its sampling volume. The linear regressions are corrected for Malmquist bias by a new method invented here in which each galaxy observation is weighted by its sampling volume. The results of correlation and regressions among the sample are significantly changed in the anticipated sense that the corrected correlation confidences are lower and the corrected slopes of the linear regressions are lower. The elimination of Malmquist bias eliminates the nonlinear rise in luminosity that has caused some authors to hypothesize additional components of FIR emission.
Heteroscedasticity as a Basis of Direction Dependence in Reversible Linear Regression Models.
Wiedermann, Wolfgang; Artner, Richard; von Eye, Alexander
2017-01-01
Heteroscedasticity is a well-known issue in linear regression modeling. When heteroscedasticity is observed, researchers are advised to remedy possible model misspecification of the explanatory part of the model (e.g., considering alternative functional forms and/or omitted variables). The present contribution discusses another source of heteroscedasticity in observational data: Directional model misspecifications in the case of nonnormal variables. Directional misspecification refers to situations where alternative models are equally likely to explain the data-generating process (e.g., x → y versus y → x). It is shown that the homoscedasticity assumption is likely to be violated in models that erroneously treat true nonnormal predictors as response variables. Recently, Direction Dependence Analysis (DDA) has been proposed as a framework to empirically evaluate the direction of effects in linear models. The present study links the phenomenon of heteroscedasticity with DDA and describes visual diagnostics and nine homoscedasticity tests that can be used to make decisions concerning the direction of effects in linear models. Results of a Monte Carlo simulation that demonstrate the adequacy of the approach are presented. An empirical example is provided, and applicability of the methodology in cases of violated assumptions is discussed.
Wang, Yubo; Veluvolu, Kalyana C
2017-06-14
It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
Changes in aerobic power of women, ages 20-64 yr
NASA Technical Reports Server (NTRS)
Jackson, A. S.; Wier, L. T.; Ayers, G. W.; Beard, E. F.; Stuteville, J. E.; Blair, S. N.
1996-01-01
This study quantified and compared the cross-sectional and longitudinal influence of age, self-report physical activity (SR-PA), and body composition (%fat) on the decline of maximal aerobic power (VO2peak) of women. The cross-sectional sample consisted of 409 healthy women, ages 20-64 yr. The 43 women of the longitudinal sample were from the same population and examined twice, the mean time between tests was 3.7 (+/-2.2) yr. Peak oxygen uptake was determined by indirect calorimetry during a maximal treadmill test. The zero-order correlation of -0.742 between VO2peak and %fat was significantly (P < 0.05) higher then the SR-PA (r = 0.626) and age correlations (r = -0.633). Linear regression defined the cross-sectional age-related decline in VO2peak at 0.537 ml.kg-1.min-1.yr-1. Multiple regression analysis (R = 0.851) showed that adding %fat and SR-PA and their interaction to the regression model reduced the age regression weight of -0.537, to -0.265 ml.kg-1.min-1.yr-1. Statistically controlling for time differences between tests, general linear models analysis showed that longitudinal changes in aerobic power were due to independent changes in %fat and SR-PA, confirming the cross-sectional results. These findings are consistent with men's data from the same lab showing that about 50% of the cross-sectional age-related decline in VO2peak was due to %fat and SR-PA.
C-17 Centerlining - Analysis of Paratrooper Trajectory
2005-06-01
Linear Regression Models. 4th ed. McGraw Hill, 2004. Kuntavanish, Mark. Program Engineer for C-17 System Program Office. Briefing to US Army...Airdrop Risk Assessment Using Bootstrap Sampling, MS AFIT Thesis AFIT/GOR/ENS/96D-01, Dec 1996. Kutner, Michael, C. Nachtsheim, and J. Neter. Applied
Comparative Research of Navy Voluntary Education at Operational Commands
2017-03-01
return on investment, ROI, logistic regression, multivariate analysis, descriptive statistics, Markov, time-series, linear programming 15. NUMBER...21 B. DESCRIPTIVE STATISTICS TABLES ...............................................25 C. PRIVACY CONSIDERATIONS...THIS PAGE INTENTIONALLY LEFT BLANK xi LIST OF TABLES Table 1. Variables and Descriptions . Adapted from NETC (2016). .......................21
A Quantitative Assessment of Student Performance and Examination Format
ERIC Educational Resources Information Center
Davison, Christopher B.; Dustova, Gandzhina
2017-01-01
This research study describes the correlations between student performance and examination format in a higher education teaching and research institution. The researchers employed a quantitative, correlational methodology utilizing linear regression analysis. The data was obtained from undergraduate student test scores over a three-year time span.…
Pessimism, Trauma, Risky Sex: Covariates of Depression in College Students
ERIC Educational Resources Information Center
Swanholm, Eric; Vosvick, Mark; Chng, Chwee-Lye
2009-01-01
Objective: To explain variance in depression in students (N = 648) using a model incorporating sexual trauma, pessimism, and risky sex. Method: Survey data collected from undergraduate students receiving credit for participation. Results: Controlling for demographics, a hierarchical linear regression analysis [Adjusted R[superscript 2] = 0.34,…
Fish assemblages at 16 sites in the upper French Broad River basin, North Carolina were related to environmental variables using detrended correspondence analysis (DCA) and linear regression. This study was conducted at the landscape scale because regional variables are controlle...
Centering Effects in HLM Level-1 Predictor Variables.
ERIC Educational Resources Information Center
Schumacker, Randall E.; Bembry, Karen
Research has suggested that important research questions can be addressed with meaningful interpretations using hierarchical linear modeling (HLM). The proper interpretation of results, however, is invariably linked to the choice of centering for the Level-1 predictor variables that produce the outcome measure for the Level-2 regression analysis.…
Statistical considerations in the analysis of data from replicated bioassays
USDA-ARS?s Scientific Manuscript database
Multiple-dose bioassay is generally the preferred method for characterizing virulence of insect pathogens. Linear regression of probit mortality on log dose enables estimation of LD50/LC50 and slope, the latter having substantial effect on LD90/95s (doses of considerable interest in pest management)...
Exploring Race Differences in Correlates of Seniors' Satisfaction with Undergraduate Education
ERIC Educational Resources Information Center
Einarson, Marne K.; Matier, Michael W.
2005-01-01
This study employed multiple linear regression and decision tree analysis to examine the correlates of overall satisfaction with undergraduate education for white, Asian American, Latino and African American seniors enrolled at 17 doctoral/research universities. Satisfaction with the overall quality of instruction and social involvement were the…
Exploring Race Differences in Correlates of Seniors' Satisfaction with Undergraduate Education
ERIC Educational Resources Information Center
Einarson, Marne K.; Matier, Michael W.
2004-01-01
This study employed multiple linear regression and decision tree analysis to examine the correlates of overall satisfaction with undergraduate education for white, Asian American, Hispanic and African American seniors enrolled at 17 research-extensive universities. Satisfaction with the overall quality of instruction and social involvement were…
USDA-ARS?s Scientific Manuscript database
Papaya (Carica papaya L.) cultivars and breeding lines were evaluated for resistance to Enterobacter cloacae (Jordan) Hormaeche & Edwards, the bacterial causal agent of internal yellowing disease (IY), using a range of concentrations of the bacterium. Linear regression analysis was performed and IY ...
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 40 Protection of Environment 20 2013-07-01 2013-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 40 Protection of Environment 20 2012-07-01 2012-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
Ding, Changfeng; Li, Xiaogang; Zhang, Taolin; Ma, Yibing; Wang, Xingxiang
2014-10-01
Soil environmental quality standards in respect of heavy metals for farmlands should be established considering both their effects on crop yield and their accumulation in the edible part. A greenhouse experiment was conducted to investigate the effects of chromium (Cr) on biomass production and Cr accumulation in carrot plants grown in a wide range of soils. The results revealed that carrot yield significantly decreased in 18 of the total 20 soils with Cr addition being the soil environmental quality standard of China. The Cr content of carrot grown in the five soils with pH>8.0 exceeded the maximum allowable level (0.5mgkg(-1)) according to the Chinese General Standard for Contaminants in Foods. The relationship between carrot Cr concentration and soil pH could be well fitted (R(2)=0.70, P<0.0001) by a linear-linear segmented regression model. The addition of Cr to soil influenced carrot yield firstly rather than the food quality. The major soil factors controlling Cr phytotoxicity and the prediction models were further identified and developed using path analysis and stepwise multiple linear regression analysis. Soil Cr thresholds for phytotoxicity meanwhile ensuring food safety were then derived on the condition of 10 percent yield reduction. Copyright © 2014 Elsevier Inc. All rights reserved.
Fernández-Fernández, Mario; Rodríguez-González, Pablo; García Alonso, J Ignacio
2016-10-01
We have developed a novel, rapid and easy calculation procedure for Mass Isotopomer Distribution Analysis based on multiple linear regression which allows the simultaneous calculation of the precursor pool enrichment and the fraction of newly synthesized labelled proteins (fractional synthesis) using linear algebra. To test this approach, we used the peptide RGGGLK as a model tryptic peptide containing three subunits of glycine. We selected glycine labelled in two 13 C atoms ( 13 C 2 -glycine) as labelled amino acid to demonstrate that spectral overlap is not a problem in the proposed methodology. The developed methodology was tested first in vitro by changing the precursor pool enrichment from 10 to 40% of 13 C 2 -glycine. Secondly, a simulated in vivo synthesis of proteins was designed by combining the natural abundance RGGGLK peptide and 10 or 20% 13 C 2 -glycine at 1 : 1, 1 : 3 and 3 : 1 ratios. Precursor pool enrichments and fractional synthesis values were calculated with satisfactory precision and accuracy using a simple spreadsheet. This novel approach can provide a relatively rapid and easy means to measure protein turnover based on stable isotope tracers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Li, T.; Griffiths, W. D.; Chen, J.
2017-11-01
The Maximum Likelihood method and the Linear Least Squares (LLS) method have been widely used to estimate Weibull parameters for reliability of brittle and metal materials. In the last 30 years, many researchers focused on the bias of Weibull modulus estimation, and some improvements have been achieved, especially in the case of the LLS method. However, there is a shortcoming in these methods for a specific type of data, where the lower tail deviates dramatically from the well-known linear fit in a classic LLS Weibull analysis. This deviation can be commonly found from the measured properties of materials, and previous applications of the LLS method on this kind of dataset present an unreliable linear regression. This deviation was previously thought to be due to physical flaws ( i.e., defects) contained in materials. However, this paper demonstrates that this deviation can also be caused by the linear transformation of the Weibull function, occurring in the traditional LLS method. Accordingly, it may not be appropriate to carry out a Weibull analysis according to the linearized Weibull function, and the Non-linear Least Squares method (Non-LS) is instead recommended for the Weibull modulus estimation of casting properties.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
A diagnostic analysis of the VVP single-doppler retrieval technique
NASA Technical Reports Server (NTRS)
Boccippio, Dennis J.
1995-01-01
A diagnostic analysis of the VVP (volume velocity processing) retrieval method is presented, with emphasis on understanding the technique as a linear, multivariate regression. Similarities and differences to the velocity-azimuth display and extended velocity-azimuth display retrieval techniques are discussed, using this framework. Conventional regression diagnostics are then employed to quantitatively determine situations in which the VVP technique is likely to fail. An algorithm for preparation and analysis of a robust VVP retrieval is developed and applied to synthetic and actual datasets with high temporal and spatial resolution. A fundamental (but quantifiable) limitation to some forms of VVP analysis is inadequate sampling dispersion in the n space of the multivariate regression, manifest as a collinearity between the basis functions of some fitted parameters. Such collinearity may be present either in the definition of these basis functions or in their realization in a given sampling configuration. This nonorthogonality may cause numerical instability, variance inflation (decrease in robustness), and increased sensitivity to bias from neglected wind components. It is shown that these effects prevent the application of VVP to small azimuthal sectors of data. The behavior of the VVP regression is further diagnosed over a wide range of sampling constraints, and reasonable sector limits are established.
Structured functional additive regression in reproducing kernel Hilbert spaces
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen
2013-01-01
Summary Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application. PMID:25013362
Parker, Kristin M; Wilson, Mark G; Vandenberg, Robert J; DeJoy, David M; Orpinas, Pamela
2009-10-01
This study tests the hypothesis that employees with comorbid physical health conditions and mental health symptoms are less productive than other employees. Self-reported health status and productivity measures were collected from 1723 employees of a national retail organization. chi2, analysis of variance, and linear contrast analyses were conducted to evaluate whether health status groups differed on productivity measures. Multivariate linear regression and multinomial logistic regression analyses were conducted to analyze how predictive health status was of productivity. Those with comorbidities were significantly less productive on all productivity measures compared with all other health status groups and those with only physical health conditions or mental health symptoms. Health status also significantly predicted levels of employee productivity. These findings provide evidence for the relationship between health statuses and productivity, which has potential programmatic implications.
NASA Technical Reports Server (NTRS)
Whitlock, C. H., III
1977-01-01
Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.
Normality of raw data in general linear models: The most widespread myth in statistics
Kery, Marc; Hatfield, Jeff S.
2003-01-01
In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.
A Linear Regression and Markov Chain Model for the Arabian Horse Registry
1993-04-01
as a tax deduction? Yes No T-4367 68 26. Regardless of previous equine tax deductions, do you consider your current horse activities to be... (Mark one...E L T-4367 A Linear Regression and Markov Chain Model For the Arabian Horse Registry Accesion For NTIS CRA&I UT 7 4:iC=D 5 D-IC JA" LI J:13tjlC,3 lO...the Arabian Horse Registry, which needed to forecast its future registration of purebred Arabian horses . A linear regression model was utilized to
On the use of regression analysis for the estimation of human biological age.
Krøll, J; Saxtrup, O
2000-01-01
The present investigation compares three linear regression procedures for the definition of human biological age (bioage). As a model system for bioage definition is used the variations with age of blood hemoglobin (B-hemoglobin) in males in the age range 50-95 years. The bioage measures compared are: 1: P-bioage; defined from regression of chronological age on B-hemoglobin results. 2: AC-bioage; obtained by indirect regression, using in reverse the equation describing the regression of B-hemoglobin on age in a reference population. 3: BC-bioage; defined by orthogonal regression on the reference regression line of B-hemoglobin on age. It is demonstrated that the P-bioage measure gives an overestimation of the bioage in the younger and an underestimation in the older individuals. This 'regression to the mean' is avoided using the indirect regression procedures. Here the relatively low SD of the BC-bioage measure results from the inclusion of individual chronological age in the orthogonal regression procedure. Observations on male blood donors illustrates the variation of the AC- and BC-bioage measures in the individual.
Application of artificial neural network to fMRI regression analysis.
Misaki, Masaya; Miyauchi, Satoru
2006-01-15
We used an artificial neural network (ANN) to detect correlations between event sequences and fMRI (functional magnetic resonance imaging) signals. The layered feed-forward neural network, given a series of events as inputs and the fMRI signal as a supervised signal, performed a non-linear regression analysis. This type of ANN is capable of approximating any continuous function, and thus this analysis method can detect any fMRI signals that correlated with corresponding events. Because of the flexible nature of ANNs, fitting to autocorrelation noise is a problem in fMRI analyses. We avoided this problem by using cross-validation and an early stopping procedure. The results showed that the ANN could detect various responses with different time courses. The simulation analysis also indicated an additional advantage of ANN over non-parametric methods in detecting parametrically modulated responses, i.e., it can detect various types of parametric modulations without a priori assumptions. The ANN regression analysis is therefore beneficial for exploratory fMRI analyses in detecting continuous changes in responses modulated by changes in input values.
Hassan, A K
2015-01-01
In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature-conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility.
Oil and gas pipeline construction cost analysis and developing regression models for cost estimation
NASA Astrophysics Data System (ADS)
Thaduri, Ravi Kiran
In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.
Spectral Regression Discriminant Analysis for Hyperspectral Image Classification
NASA Astrophysics Data System (ADS)
Pan, Y.; Wu, J.; Huang, H.; Liu, J.
2012-08-01
Dimensionality reduction algorithms, which aim to select a small set of efficient and discriminant features, have attracted great attention for Hyperspectral Image Classification. The manifold learning methods are popular for dimensionality reduction, such as Locally Linear Embedding, Isomap, and Laplacian Eigenmap. However, a disadvantage of many manifold learning methods is that their computations usually involve eigen-decomposition of dense matrices which is expensive in both time and memory. In this paper, we introduce a new dimensionality reduction method, called Spectral Regression Discriminant Analysis (SRDA). SRDA casts the problem of learning an embedding function into a regression framework, which avoids eigen-decomposition of dense matrices. Also, with the regression based framework, different kinds of regularizes can be naturally incorporated into our algorithm which makes it more flexible. It can make efficient use of data points to discover the intrinsic discriminant structure in the data. Experimental results on Washington DC Mall and AVIRIS Indian Pines hyperspectral data sets demonstrate the effectiveness of the proposed method.
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-07-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Porcaro, Antonio B; Petrozziello, Aldo; Romano, Mario; Sava, Teodoro; Ghimenton, Claudio; Caruso, Beatrice; Migliorini, Filippo; Zecchini Antoniolli, Stefano; Rubilotta, Emanuele; Lacola, Vincenzo; Monaco, Carmelo; Comunale, Luigi
2010-01-01
Prostate cancer is an interesting tumor for endocrine investigation. The prostate-specific antigen/free testosterone (PSA/FT) ratio has been shown to be effective in clustering patients in prognostic groups as follows: low risk (PSA/FT ≤0.20), intermediate risk (PSA/FT >0.20 and ≤0.40) and high risk (PSA/FT >0.40 and ≤1.5). In the present study we explored the total PSA and FT distributions, and linear regression of FT predicting PSA in the different groups (PSA/FT, pT and pG) and subgroups (pT and pG) of patients according to the prognostic PSA/FT ratio. The study included 128 operated prostate cancer patients. Pretreatment simultaneous serum samples were obtained for measuring free testosterone (FT) and total PSA levels. Patients were grouped according to the total PSA/FT ratio prognostic clusters (≤0.20, >0.20 and ≤0.40, >0.40), pT (2, 3a and 3b+4) and pathological Gleason score (pG) (≤6, = 7 >3 + 4, ≥7 >4 + 3). The pT and pG sets were subgrouped according to the prognostic PSA/FT ratio. Linear regression analysis of FT predicting total PSA was computed according to the different PSA/FT prognostic clusters for the: (1) total sample population, (2) pT and pG groups, (3) intraprostatic (pT2) and extraprostatic disease (pT3a/3b/4), and (4) low-intermediate grade (pG ≤6) and high-grade (pG ≥7) prostate cancer. Analysis of variance always showed highly significant different PSA distributions for (1) the different PSA/FT, pT and pG groups; and (2) the pT and pG prognostic subgroups. Significant FT distributions were detected for the (1) PSA/FT and pT groups; and (2) the pT2, pT3a and pG ≤6 prognostic PSA/FT subgroups. Correlation, variance and linear regression analysis of FT predicting total PSA was significant for (1) the PSA/FT prognostic clusters, (2) all the pT2 and pT3a subgroups, and (3) the pT3b/4 subgroup with PSA/FT >0.20 and ≤0.40, and (4) all the pG subsets. Linear regression analysis showed that the slopes of the predicting variable (FT) were always highly significant for patients with (1) intraprostate and extraprostate disease, and (2) low-grade and high-grade prostate cancer. According to the prognostic PSA/FT ratio, significantly lower levels of FT are detected in prostate cancer patients with extensive and high-grade disease. Also, significant linear correlations of FT predicting PSA are assessed in the different groups and subgroups of patients clustered according to the prognostic PSA/FT ratio. Confirmatory studies are needed. Copyright © 2010 S. Karger AG, Basel.
Changes in Clavicle Length and Maturation in Americans: 1840-1980.
Langley, Natalie R; Cridlin, Sandra
2016-01-01
Secular changes refer to short-term biological changes ostensibly due to environmental factors. Two well-documented secular trends in many populations are earlier age of menarche and increasing stature. This study synthesizes data on maximum clavicle length and fusion of the medial epiphysis in 1840-1980 American birth cohorts to provide a comprehensive assessment of developmental and morphological change in the clavicle. Clavicles from the Hamann-Todd Human Osteological Collection (n = 354), McKern and Stewart Korean War males (n = 341), Forensic Anthropology Data Bank (n = 1,239), and the McCormick Clavicle Collection (n = 1,137) were used in the analysis. Transition analysis was used to evaluate fusion of the medial epiphysis (scored as unfused, fusing, or fused). Several statistical treatments were used to assess fluctuations in maximum clavicle length. First, Durbin-Watson tests were used to evaluate autocorrelation, and a local regression (LOESS) was used to identify visual shifts in the regression slope. Next, piecewise regression was used to fit linear regression models before and after the estimated breakpoints. Multiple starting parameters were tested in the range determined to contain the breakpoint, and the model with the smallest mean squared error was chosen as the best fit. The parameters from the best-fit models were then used to derive the piecewise models, which were compared with the initial simple linear regression models to determine which model provided the best fit for the secular change data. The epiphyseal union data indicate a decline in the age at onset of fusion since the early twentieth century. Fusion commences approximately four years earlier in mid- to late twentieth-century birth cohorts than in late nineteenth- and early twentieth-century birth cohorts. However, fusion is completed at roughly the same age across cohorts. The most significant decline in age at onset of epiphyseal union appears to have occurred since the mid-twentieth century. LOESS plots show a breakpoint in the clavicle length data around the mid-twentieth century in both sexes, and piecewise regression models indicate a significant decrease in clavicle length in the American population after 1940. The piecewise model provides a slightly better fit than the simple linear model. Since the model standard error is not substantially different from the piecewise model, an argument could be made to select the less complex linear model. However, we chose the piecewise model to detect changes in clavicle length that are overfitted with a linear model. The decrease in maximum clavicle length is in line with a documented narrowing of the American skeletal form, as shown by analyses of cranial and facial breadth and bi-iliac breadth of the pelvis. Environmental influences on skeletal form include increases in body mass index, health improvements, improved socioeconomic status, and elimination of infectious diseases. Secular changes in bony dimensions and skeletal maturation stipulate that medical and forensic standards used to deduce information about growth, health, and biological traits must be derived from modern populations.
NASA Astrophysics Data System (ADS)
Mansouri, Edris; Feizi, Faranak; Jafari Rad, Alireza; Arian, Mehran
2018-03-01
This paper uses multivariate regression to create a mathematical model for iron skarn exploration in the Sarvian area, central Iran, using multivariate regression for mineral prospectivity mapping (MPM). The main target of this paper is to apply multivariate regression analysis (as an MPM method) to map iron outcrops in the northeastern part of the study area in order to discover new iron deposits in other parts of the study area. Two types of multivariate regression models using two linear equations were employed to discover new mineral deposits. This method is one of the reliable methods for processing satellite images. ASTER satellite images (14 bands) were used as unique independent variables (UIVs), and iron outcrops were mapped as dependent variables for MPM. According to the results of the probability value (p value), coefficient of determination value (R2) and adjusted determination coefficient (Radj2), the second regression model (which consistent of multiple UIVs) fitted better than other models. The accuracy of the model was confirmed by iron outcrops map and geological observation. Based on field observation, iron mineralization occurs at the contact of limestone and intrusive rocks (skarn type).
Zhou, Qing-he; Zhu, Bo; Wei, Chang-na; Yan, Min
2016-03-24
Studies have shown that abdominal girth and vertebral column length have high predictive value for spinal spread after administering a dose of plain bupivacaine. we designed a study to identify the specific correlations between abdominal girth, vertebral column length and a 0.5% dosage of plain bupivacaine, which should provide a minimum upper block level (T12) and a suitable upper block level (T10) for lower limb surgeries. A suitable dose of 0.5% plain bupivacaine was administered intrathecally between the L3 and L4 vertebrae for lower limb surgeries. If the upper cephalad spread of the patient by loss of pinprick discrimination was T12 or T10, the patient was enrolled in this study. Five patient variables and intrathecal plain bupivacaine dose were recorded. Linear regression and multiple regression analyses were performed. Totals of 111 patients and 121 patients who lost pinprick discrimination at T12 and T10, respectively, were analyzed in this study. Linear regression analysis showed that only abdominal girth and plain bupivacaine dose were strongly correlated (r =-0.827 for T12, r = -0.806 for T10; both p < 0.0001). Multiple linear regression analysis showed that both abdominal girth and vertebral column length were the key determinants of plain bupivacaine dose (both p < 0.0001). R(2) was 0.874 and 0.860 for the loss of pinprick discrimination at T12 and T10, respectively. Our data indicated that vertebral column length and abdominal girth were strongly correlated with the dosage of intrathecal plain bupivacaine for the loss of pinprick discrimination at T12 and T10. The two regression equations were YT12 = 3.547 + 0.045X1-0.044X2 and YT10 = 3.848 + 0.047X1- 0.046X2 (Y, 0.5% plain bupivacaine volume; X1, vertebral column length;and X 2, abdominal girth), which can accurately predict the minimum and suitable intrathecal bupivacaine dose for lower limb surgery to a great extent, separately.
Sun, You-Wen; Liu, Wen-Qing; Wang, Shi-Mei; Huang, Shu-Hua; Yu, Xiao-Man
2011-10-01
A method of interference correction for nondispersive infrared multi-component gas analysis was described. According to the successive integral gas absorption models and methods, the influence of temperature and air pressure on the integral line strengths and linetype was considered, and based on Lorentz detuning linetypes, the absorption cross sections and response coefficients of H2O, CO2, CO, and NO on each filter channel were obtained. The four dimension linear regression equations for interference correction were established by response coefficients, the absorption cross interference was corrected by solving the multi-dimensional linear regression equations, and after interference correction, the pure absorbance signal on each filter channel was only controlled by the corresponding target gas concentration. When the sample cell was filled with gas mixture with a certain concentration proportion of CO, NO and CO2, the pure absorbance after interference correction was used for concentration inversion, the inversion concentration error for CO2 is 2.0%, the inversion concentration error for CO is 1.6%, and the inversion concentration error for NO is 1.7%. Both the theory and experiment prove that the interference correction method proposed for NDIR multi-component gas analysis is feasible.
Measurement Consistency from Magnetic Resonance Images
Chung, Dongjun; Chung, Moo K.; Durtschi, Reid B.; Lindell, R. Gentry; Vorperian, Houri K.
2010-01-01
Rationale and Objectives In quantifying medical images, length-based measurements are still obtained manually. Due to possible human error, a measurement protocol is required to guarantee the consistency of measurements. In this paper, we review various statistical techniques that can be used in determining measurement consistency. The focus is on detecting a possible measurement bias and determining the robustness of the procedures to outliers. Materials and Methods We review correlation analysis, linear regression, Bland-Altman method, paired t-test, and analysis of variance (ANOVA). These techniques were applied to measurements, obtained by two raters, of head and neck structures from magnetic resonance images (MRI). Results The correlation analysis and the linear regression were shown to be insufficient for detecting measurement inconsistency. They are also very sensitive to outliers. The widely used Bland-Altman method is a visualization technique so it lacks the numerical quantification. The paired t-test tends to be sensitive to small measurement bias. On the other hand, ANOVA performs well even under small measurement bias. Conclusion In almost all cases, using only one method is insufficient and it is recommended to use several methods simultaneously. In general, ANOVA performs the best. PMID:18790405
Bao, Jie; Hou, Zhangshuan; Huang, Maoyi; ...
2015-12-04
Here, effective sensitivity analysis approaches are needed to identify important parameters or factors and their uncertainties in complex Earth system models composed of multi-phase multi-component phenomena and multiple biogeophysical-biogeochemical processes. In this study, the impacts of 10 hydrologic parameters in the Community Land Model on simulations of runoff and latent heat flux are evaluated using data from a watershed. Different metrics, including residual statistics, the Nash-Sutcliffe coefficient, and log mean square error, are used as alternative measures of the deviations between the simulated and field observed values. Four sensitivity analysis (SA) approaches, including analysis of variance based on the generalizedmore » linear model, generalized cross validation based on the multivariate adaptive regression splines model, standardized regression coefficients based on a linear regression model, and analysis of variance based on support vector machine, are investigated. Results suggest that these approaches show consistent measurement of the impacts of major hydrologic parameters on response variables, but with differences in the relative contributions, particularly for the secondary parameters. The convergence behaviors of the SA with respect to the number of sampling points are also examined with different combinations of input parameter sets and output response variables and their alternative metrics. This study helps identify the optimal SA approach, provides guidance for the calibration of the Community Land Model parameters to improve the model simulations of land surface fluxes, and approximates the magnitudes to be adjusted in the parameter values during parametric model optimization.« less
NASA Technical Reports Server (NTRS)
Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.
2010-01-01
The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
Khamanga, Sandile M; Walker, Roderick B
2011-01-15
An accurate, sensitive and specific high performance liquid chromatography-electrochemical detection (HPLC-ECD) method that was developed and validated for captopril (CPT) is presented. Separation was achieved using a Phenomenex(®) Luna 5 μm (C(18)) column and a mobile phase comprised of phosphate buffer (adjusted to pH 3.0): acetonitrile in a ratio of 70:30 (v/v). Detection was accomplished using a full scan multi channel ESA Coulometric detector in the "oxidative-screen" mode with the upstream electrode (E(1)) set at +600 mV and the downstream (analytical) electrode (E(2)) set at +950 mV, while the potential of the guard cell was maintained at +1050 mV. The detector gain was set at 300. Experimental design using central composite design (CCD) was used to facilitate method development. Mobile phase pH, molarity and concentration of acetonitrile (ACN) were considered the critical factors to be studied to establish the retention time of CPT and cyclizine (CYC) that was used as the internal standard. Twenty experiments including centre points were undertaken and a quadratic model was derived for the retention time for CPT using the experimental data. The method was validated for linearity, accuracy, precision, limits of quantitation and detection, as per the ICH guidelines. The system was found to produce sharp and well-resolved peaks for CPT and CYC with retention times of 3.08 and 7.56 min, respectively. Linear regression analysis for the calibration curve showed a good linear relationship with a regression coefficient of 0.978 in the concentration range of 2-70 μg/mL. The linear regression equation was y=0.0131x+0.0275. The limits of detection (LOQ) and quantitation (LOD) were found to be 2.27 and 0.6 μg/mL, respectively. The method was used to analyze CPT in tablets. The wide range for linearity, accuracy, sensitivity, short retention time and composition of the mobile phase indicated that this method is better for the quantification of CPT than the pharmacopoeial methods. Copyright © 2010 Elsevier B.V. All rights reserved.
Biostatistics Series Module 10: Brief Overview of Multivariate Methods.
Hazra, Avijit; Gogtay, Nithya
2017-01-01
Multivariate analysis refers to statistical techniques that simultaneously look at three or more variables in relation to the subjects under investigation with the aim of identifying or clarifying the relationships between them. These techniques have been broadly classified as dependence techniques, which explore the relationship between one or more dependent variables and their independent predictors, and interdependence techniques, that make no such distinction but treat all variables equally in a search for underlying relationships. Multiple linear regression models a situation where a single numerical dependent variable is to be predicted from multiple numerical independent variables. Logistic regression is used when the outcome variable is dichotomous in nature. The log-linear technique models count type of data and can be used to analyze cross-tabulations where more than two variables are included. Analysis of covariance is an extension of analysis of variance (ANOVA), in which an additional independent variable of interest, the covariate, is brought into the analysis. It tries to examine whether a difference persists after "controlling" for the effect of the covariate that can impact the numerical dependent variable of interest. Multivariate analysis of variance (MANOVA) is a multivariate extension of ANOVA used when multiple numerical dependent variables have to be incorporated in the analysis. Interdependence techniques are more commonly applied to psychometrics, social sciences and market research. Exploratory factor analysis and principal component analysis are related techniques that seek to extract from a larger number of metric variables, a smaller number of composite factors or components, which are linearly related to the original variables. Cluster analysis aims to identify, in a large number of cases, relatively homogeneous groups called clusters, without prior information about the groups. The calculation intensive nature of multivariate analysis has so far precluded most researchers from using these techniques routinely. The situation is now changing with wider availability, and increasing sophistication of statistical software and researchers should no longer shy away from exploring the applications of multivariate methods to real-life data sets.
Regression-based model of skin diffuse reflectance for skin color analysis
NASA Astrophysics Data System (ADS)
Tsumura, Norimichi; Kawazoe, Daisuke; Nakaguchi, Toshiya; Ojima, Nobutoshi; Miyake, Yoichi
2008-11-01
A simple regression-based model of skin diffuse reflectance is developed based on reflectance samples calculated by Monte Carlo simulation of light transport in a two-layered skin model. This reflectance model includes the values of spectral reflectance in the visible spectra for Japanese women. The modified Lambert Beer law holds in the proposed model with a modified mean free path length in non-linear density space. The averaged RMS and maximum errors of the proposed model were 1.1 and 3.1%, respectively, in the above range.
Methods for scalar-on-function regression.
Reiss, Philip T; Goldsmith, Jeff; Shang, Han Lin; Ogden, R Todd
2017-08-01
Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.
On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction.
Crop, F; Van Rompaye, B; Paelinck, L; Vakaet, L; Thierens, H; De Wagter, C
2008-07-21
The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry.
On the impact of relatedness on SNP association analysis.
Gross, Arnd; Tönjes, Anke; Scholz, Markus
2017-12-06
When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness. We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control is applied, type I error is preserved while power decreases rapidly with increasing variance inflation. Stronger relatedness as well as higher heritability result in increased variance of the effect estimate of simple linear regression analysis. While type I error rates are generally inflated, the behaviour of power is more complex since power can be increased or reduced in dependence on relatedness and the heritability of the phenotype. Genomic control cannot be recommended to deal with inflation due to relatedness. Although it preserves type I error, the loss in power can be considerable. We provide a simple formula for estimating variance inflation given the relatedness structure and the heritability of a trait of interest. As a rule of thumb, variance inflation below 1.05 does not require correction and simple linear regression analysis is still appropriate.
ERIC Educational Resources Information Center
Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.
2013-01-01
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Li, Siyue; Zhang, Quanfa
2011-06-15
Water samples were collected for determination of dissolved trace metals in 56 sampling sites throughout the upper Han River, China. Multivariate statistical analyses including correlation analysis, stepwise multiple linear regression models, and principal component and factor analysis (PCA/FA) were employed to examine the land use influences on trace metals, and a receptor model of factor analysis-multiple linear regression (FA-MLR) was used for source identification/apportionment of anthropogenic heavy metals in the surface water of the River. Our results revealed that land use was an important factor in water metals in the snow melt flow period and land use in the riparian zone was not a better predictor of metals than land use away from the river. Urbanization in a watershed and vegetation along river networks could better explain metals, and agriculture, regardless of its relative location, however slightly explained metal variables in the upper Han River. FA-MLR analysis identified five source types of metals, and mining, fossil fuel combustion, and vehicle exhaust were the dominant pollutions in the surface waters. The results demonstrated great impacts of human activities on metal concentrations in the subtropical river of China. Copyright © 2011 Elsevier B.V. All rights reserved.
Association between the Type of Workplace and Lung Function in Copper Miners
Gruszczyński, Leszek; Wojakowska, Anna; Ścieszka, Marek; Turczyn, Barbara; Schmidt, Edward
2016-01-01
The aim of the analysis was to retrospectively assess changes in lung function in copper miners depending on the type of workplace. In the groups of 225 operators, 188 welders, and 475 representatives of other jobs, spirometry was performed at the start of employment and subsequently after 10, 20, and 25 years of work. Spirometry Longitudinal Data Analysis software was used to estimate changes in group means for FEV1 and FVC. Multiple linear regression analysis was used to assess an association between workplace and lung function. Lung function assessed on the basis of calculation of longitudinal FEV1 (FVC) decline was similar in all studied groups. However, multiple linear regression model used in cross-sectional analysis revealed an association between workplace and lung function. In the group of welders, FEF75 was lower in comparison to operators and other miners as early as after 10 years of work. Simultaneously, in smoking welders, the FEV1/FVC ratio was lower than in nonsmokers (p < 0,05). The interactions between type of workplace and smoking (p < 0,05) in their effect on FVC, FEV1, PEF, and FEF50 were shown. Among underground working copper miners, the group of smoking welders is especially threatened by impairment of lung ventilatory function. PMID:27274987
Regression Analysis of Top of Descent Location for Idle-thrust Descents
NASA Technical Reports Server (NTRS)
Stell, Laurel; Bronsvoort, Jesper; McDonald, Greg
2013-01-01
In this paper, multiple regression analysis is used to model the top of descent (TOD) location of user-preferred descent trajectories computed by the flight management system (FMS) on over 1000 commercial flights into Melbourne, Australia. The independent variables cruise altitude, final altitude, cruise Mach, descent speed, wind, and engine type were also recorded or computed post-operations. Both first-order and second-order models are considered, where cross-validation, hypothesis testing, and additional analysis are used to compare models. This identifies the models that should give the smallest errors if used to predict TOD location for new data in the future. A model that is linear in TOD altitude, final altitude, descent speed, and wind gives an estimated standard deviation of 3.9 nmi for TOD location given the trajec- tory parameters, which means about 80% of predictions would have error less than 5 nmi in absolute value. This accuracy is better than demonstrated by other ground automation predictions using kinetic models. Furthermore, this approach would enable online learning of the model. Additional data or further knowl- edge of algorithms is necessary to conclude definitively that no second-order terms are appropriate. Possible applications of the linear model are described, including enabling arriving aircraft to fly optimized descents computed by the FMS even in congested airspace. In particular, a model for TOD location that is linear in the independent variables would enable decision support tool human-machine interfaces for which a kinetic approach would be computationally too slow.
Selenium Exposure and Cancer Risk: an Updated Meta-analysis and Meta-regression
Cai, Xianlei; Wang, Chen; Yu, Wanqi; Fan, Wenjie; Wang, Shan; Shen, Ning; Wu, Pengcheng; Li, Xiuyang; Wang, Fudi
2016-01-01
The objective of this study was to investigate the associations between selenium exposure and cancer risk. We identified 69 studies and applied meta-analysis, meta-regression and dose-response analysis to obtain available evidence. The results indicated that high selenium exposure had a protective effect on cancer risk (pooled OR = 0.78; 95%CI: 0.73–0.83). The results of linear and nonlinear dose-response analysis indicated that high serum/plasma selenium and toenail selenium had the efficacy on cancer prevention. However, we did not find a protective efficacy of selenium supplement. High selenium exposure may have different effects on specific types of cancer. It decreased the risk of breast cancer, lung cancer, esophageal cancer, gastric cancer, and prostate cancer, but it was not associated with colorectal cancer, bladder cancer, and skin cancer. PMID:26786590
The Association of Fever with Total Mechanical Ventilation Time in Critically Ill Patients.
Park, Dong Won; Egi, Moritoki; Nishimura, Masaji; Chang, Youjin; Suh, Gee Young; Lim, Chae Man; Kim, Jae Yeol; Tada, Keiichi; Matsuo, Koichi; Takeda, Shinhiro; Tsuruta, Ryosuke; Yokoyama, Takeshi; Kim, Seon Ok; Koh, Younsuck
2016-12-01
This research aims to investigate the impact of fever on total mechanical ventilation time (TVT) in critically ill patients. Subgroup analysis was conducted using a previous prospective, multicenter observational study. We included mechanically ventilated patients for more than 24 hours from 10 Korean and 15 Japanese intensive care units (ICU), and recorded maximal body temperature under the support of mechanical ventilation (MAX(MV)). To assess the independent association of MAX(MV) with TVT, we used propensity-matched analysis in a total of 769 survived patients with medical or surgical admission, separately. Together with multiple linear regression analysis to evaluate the association between the severity of fever and TVT, the effect of MAX(MV) on ventilator-free days was also observed by quantile regression analysis in all subjects including non-survivors. After propensity score matching, a MAX(MV) ≥ 37.5°C was significantly associated with longer mean TVT by 5.4 days in medical admission, and by 1.2 days in surgical admission, compared to those with MAX(MV) of 36.5°C to 37.4°C. In multivariate linear regression analysis, patients with three categories of fever (MAX(MV) of 37.5°C to 38.4°C, 38.5°C to 39.4°C, and ≥ 39.5°C) sustained a significantly longer duration of TVT than those with normal range of MAX(MV) in both categories of ICU admission. A significant association between MAX(MV) and mechanical ventilator-free days was also observed in all enrolled subjects. Fever may be a detrimental factor to prolong TVT in mechanically ventilated patients. These findings suggest that fever in mechanically ventilated patients might be associated with worse mechanical ventilation outcome.
Estimating top-of-atmosphere thermal infrared radiance using MERRA-2 atmospheric data
NASA Astrophysics Data System (ADS)
Kleynhans, Tania; Montanaro, Matthew; Gerace, Aaron; Kanan, Christopher
2017-05-01
Thermal infrared satellite images have been widely used in environmental studies. However, satellites have limited temporal resolution, e.g., 16 day Landsat or 1 to 2 day Terra MODIS. This paper investigates the use of the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) reanalysis data product, produced by NASA's Global Modeling and Assimilation Office (GMAO) to predict global topof-atmosphere (TOA) thermal infrared radiance. The high temporal resolution of the MERRA-2 data product presents opportunities for novel research and applications. Various methods were applied to estimate TOA radiance from MERRA-2 variables namely (1) a parameterized physics based method, (2) Linear regression models and (3) non-linear Support Vector Regression. Model prediction accuracy was evaluated using temporally and spatially coincident Moderate Resolution Imaging Spectroradiometer (MODIS) thermal infrared data as reference data. This research found that Support Vector Regression with a radial basis function kernel produced the lowest error rates. Sources of errors are discussed and defined. Further research is currently being conducted to train deep learning models to predict TOA thermal radiance
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-11-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Incremental online learning in high dimensions.
Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan
2005-12-01
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
Salturk, Cuneyt; Karakurt, Zuhal; Takir, Huriye Berk; Balci, Merih; Kargin, Feyza; Mocin, Ozlem Yazıcıoglu; Gungor, Gokay; Ozmen, Ipek; Oztas, Selahattin; Yalcinsoy, Murat; Evin, Ruya; Ozturk, Murat; Adiguzel, Nalan
2015-01-01
The objective of this study was to compare the change in 6-minute walking distance (6MWD) in 1 year as an indicator of exercise capacity among patients undergoing home non-invasive mechanical ventilation (NIMV) due to chronic hypercapnic respiratory failure (CHRF) caused by different etiologies. This retrospective cohort study was conducted in a tertiary pulmonary disease hospital in patients who had completed 1-year follow-up under home NIMV because of CHRF with different etiologies (ie, chronic obstructive pulmonary disease [COPD], obesity hypoventilation syndrome [OHS], kyphoscoliosis [KS], and diffuse parenchymal lung disease [DPLD]), between January 2011 and January 2012. The results of arterial blood gas (ABG) analyses and spirometry, and 6MWD measurements with 12-month interval were recorded from the patient files, in addition to demographics, comorbidities, and body mass indices. The groups were compared in terms of 6MWD via analysis of variance (ANOVA) and multiple linear regression (MLR) analysis (independent variables: analysis age, sex, baseline 6MWD, baseline forced expiratory volume in 1 second, and baseline partial carbon dioxide pressure, in reference to COPD group). A total of 105 patients with a mean age (± standard deviation) of 61±12 years of whom 37 had COPD, 34 had OHS, 20 had KS, and 14 had DPLD were included in statistical analysis. There were no significant differences between groups in the baseline and delta values of ABG and spirometry findings. Both univariate ANOVA and MLR showed that the OHS group had the lowest baseline 6MWD and the highest decrease in 1 year (linear regression coefficient -24.48; 95% CI -48.74 to -0.21, P=0.048); while the KS group had the best baseline values and the biggest improvement under home NIMV (linear regression coefficient 26.94; 95% CI -3.79 to 57.66, P=0.085). The 6MWD measurements revealed improvement in exercise capacity test in CHRF patients receiving home NIMV treatment on long-term depends on etiological diagnoses.
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
NASA Astrophysics Data System (ADS)
Bonelli, Maria Grazia; Ferrini, Mauro; Manni, Andrea
2016-12-01
The assessment of metals and organic micropollutants contamination in agricultural soils is a difficult challenge due to the extensive area used to collect and analyze a very large number of samples. With Dioxins and dioxin-like PCBs measurement methods and subsequent the treatment of data, the European Community advises the develop low-cost and fast methods allowing routing analysis of a great number of samples, providing rapid measurement of these compounds in the environment, feeds and food. The aim of the present work has been to find a method suitable to describe the relations occurring between organic and inorganic contaminants and use the value of the latter in order to forecast the former. In practice, the use of a metal portable soil analyzer coupled with an efficient statistical procedure enables the required objective to be achieved. Compared to Multiple Linear Regression, the Artificial Neural Networks technique has shown to be an excellent forecasting method, though there is no linear correlation between the variables to be analyzed.
Mao, G D; Adeli, K; Eisenbrey, A B; Artiss, J D
1996-07-01
This evaluation was undertaken to verify the application protocol for the CK-MB assay on the ACCESS Immunoassay Analyzer (Sanofi Diagnostics Pasteur, Chaska, MN). The results show that the ACCESS CK-MB assay total imprecision was 6.8% to 9.1%. Analytical linearity of the ACCESS CK-MB assay was excellent in the range of < 1-214 micrograms/L. A comparison of the ACCESS CK-MB assay with the IMx (Abbott Laboratories, Abbott Park, IL) method shows good correlation r = 0.990 (n = 108). Linear regression analysis yielded Y = 1.36X-0.3, Sx/y = 7.2. ACCESS CK-MB values also correlated well with CK-MB by electrophoresis with r = 0.968 (n = 132). The linear regression equation for this comparison was Y = 1.08X + 1.4, Sx/y = 14.1. The expected non-myocardial infarction range of CK-MB determined by the ACCESS system was 1.3-9.4 micrograms/L (mean = 4.0, n = 58). The ACCESS CK-MB assay would appear to be rapid, precise and clinically useful.
NASA Technical Reports Server (NTRS)
Colwell, R. N. (Principal Investigator)
1983-01-01
The geometric quality of the TM and MSS film products were evaluated by making selective photo measurements such as scale, linear and area determinations; and by measuring the coordinates of known features on both the film products and map products and then relating these paired observations using a standard linear least squares regression approach. Quantitative interpretation tests are described which evaluate the quality and utility of the TM film products and various band combinations for detecting and identifying important forest and agricultural features.
External contribution to urban air pollution.
Grima, Ramon; Micallef, Alfred; Colls, Jeremy J
2002-02-01
Elevated particulate matter concentrations in urban locations have normally been associated with local traffic emissions. Recently it has been suggested that such episodes are influenced to a high degree by PM10 sources external to urban areas. To further corroborate this hypothesis, linear regression was sought between PM10 concentrations measured at eight urban sites in the U.K., with particulate sulphate concentration measured at two rural sites, for the years 1993-1997. Analysis of the slopes, intercepts and correlation coefficients indicate a possible relationship between urban PM10 and rural sulphate concentrations. The influences of wind direction and of the distance of the urban from the rural sites on the values of the three statistical parameters are also explored. The value of linear regression as an analysis tool in such cases is discussed and it is shown that an analysis of the sign of the rate of change of the urban PM10 and rural sulphate concentrations provides a more realistic method of correlation. The results indicate a major influence on urban PM10 concentrations from the eastern side of the United Kingdom. Linear correlation was also sought using PM10 data from nine urban sites in London and nearby rural Rochester. Analysis of the magnitude of the gradients and intercepts together with episode correlation analysis between the two sites showed the effect of transported PM10 on the local London concentrations. This article also presents methods to estimate the influence of rural and urban PM10 sources on urban PM10 concentrations and to obtain a rough estimate of the transboundary contribution to urban air pollution from the PM10 concentration data of the urban site.
Grantz, Erin; Haggard, Brian; Scott, J Thad
2018-06-12
We calculated four median datasets (chlorophyll a, Chl a; total phosphorus, TP; and transparency) using multiple approaches to handling censored observations, including substituting fractions of the quantification limit (QL; dataset 1 = 1QL, dataset 2 = 0.5QL) and statistical methods for censored datasets (datasets 3-4) for approximately 100 Texas, USA reservoirs. Trend analyses of differences between dataset 1 and 3 medians indicated percent difference increased linearly above thresholds in percent censored data (%Cen). This relationship was extrapolated to estimate medians for site-parameter combinations with %Cen > 80%, which were combined with dataset 3 as dataset 4. Changepoint analysis of Chl a- and transparency-TP relationships indicated threshold differences up to 50% between datasets. Recursive analysis identified secondary thresholds in dataset 4. Threshold differences show that information introduced via substitution or missing due to limitations of statistical methods biased values, underestimated error, and inflated the strength of TP thresholds identified in datasets 1-3. Analysis of covariance identified differences in linear regression models relating transparency-TP between datasets 1, 2, and the more statistically robust datasets 3-4. Study findings identify high-risk scenarios for biased analytical outcomes when using substitution. These include high probability of median overestimation when %Cen > 50-60% for a single QL, or when %Cen is as low 16% for multiple QL's. Changepoint analysis was uniquely vulnerable to substitution effects when using medians from sites with %Cen > 50%. Linear regression analysis was less sensitive to substitution and missing data effects, but differences in model parameters for transparency cannot be discounted and could be magnified by log-transformation of the variables.
ERIC Educational Resources Information Center
Denham, Bryan E.
2009-01-01
Grounded conceptually in social cognitive theory, this research examines how personal, behavioral, and environmental factors are associated with risk perceptions of anabolic-androgenic steroids. Ordinal logistic regression and logit log-linear models applied to data gathered from high-school seniors (N = 2,160) in the 2005 Monitoring the Future…
Should a First Course in ANOVA Be Taught Through MLR?
ERIC Educational Resources Information Center
Williams, John D.
Before implementing a course in the analysis of variance (ANOVA) taught through multiple linear regression, several concerns must be addressed. Adequate computer facilities that are available to students on a low-cost or cost-free basis are necessary; also students must be able to meaningfully communicate with their major advisor regarding their…
Impact of Preadmission Variables on USMLE Step 1 and Step 2 Performance
ERIC Educational Resources Information Center
Kleshinski, James; Khuder, Sadik A.; Shapiro, Joseph I.; Gold, Jeffrey P.
2009-01-01
Purpose: To examine the predictive ability of preadmission variables on United States Medical Licensing Examinations (USMLE) step 1 and step 2 performance, incorporating the use of a neural network model. Method: Preadmission data were collected on matriculants from 1998 to 2004. Linear regression analysis was first used to identify predictors of…
ERIC Educational Resources Information Center
Romero, Andrea J.; Ruiz, Myrna
2007-01-01
We examined coping with risky behaviors (cigarettes, alcohol/drugs, yelling/ hitting, and anger), familism (family proximity and parental closeness) and parental monitoring (knowledge and discipline) in a sample of 56 adolescents (11-15 years old) predominantly of Mexican descent at two time points. Multiple linear regression analysis indicated…
A National Study of the Association between Food Environments and County-Level Health Outcomes
ERIC Educational Resources Information Center
Ahern, Melissa; Brown, Cheryl; Dukas, Stephen
2011-01-01
Purpose: This national, county-level study examines the relationship between food availability and access, and health outcomes (mortality, diabetes, and obesity rates) in both metro and non-metro areas. Methods: This is a secondary, cross-sectional analysis using Food Environment Atlas and CDC data. Linear regression models estimate relationships…
USDA-ARS?s Scientific Manuscript database
Gastrointestinal (GI) nematode infections are a worldwide threat to animal health and production. In this study, we performed a genome-wide association study between copy number variations (CNV) and resistance to GI nematodes in an Angus cattle population. Using a linear regression analysis, we iden...
Exposure to Media Violence and Other Correlates of Aggressive Behavior in Preschool Children
ERIC Educational Resources Information Center
Daly, Laura A.; Perez, Linda M.
2009-01-01
This article examines the play behavior of 70 preschool children and its relationship to television violence and regulatory status. Linear regression analysis showed that violent program content and poor self-regulation were independently and significantly associated with overall and physical aggression. Advanced maternal age and child age and…
Bioassay of the Nucleopolyhedrosis Virus of Neodiprion sertifer (Hymenoptera: Diprionidae)
M.A. Mohamed; J.D. Podgwaite
1982-01-01
Linear regression analysis of probit mortality versus several concentrations of nucleopolyhedrosis virus of Neodiprion sertifer resulted in the equation Y = 2.170 + 0.872X. An LC50 was calculated at 1758 PIB/ml. Also, the incubation time of the virus was dependent on its concentration. Most insect viruses possess the potential...
ERIC Educational Resources Information Center
Roulette-McIntyre, Ovella; Bagaka's, Joshua G.; Drake, Daniel D.
2005-01-01
This study identified parental practices that relate positively to high school students' academic performance. Parents of 643 high school students participated in the study. Data analysis, using a multiple linear regression model, shows parent-school connection, student gender, and race are significant predictors of student academic performance.…
ERIC Educational Resources Information Center
Dubnjakovic, Ana
2012-01-01
The current study investigates factors influencing increase in reference transactions in a typical week in academic libraries across the United States of America. Employing multiple regression analysis and general linear modeling, variables of interest from the "Academic Library Survey (ALS) 2006" survey (sample size 3960 academic libraries) were…
ERIC Educational Resources Information Center
Musekamp, Frank; Pearce, Jacob
2016-01-01
The goal of this paper is to examine the relationship of student motivation and achievement in low-stakes assessment contexts. Using Pearson product-moment correlations and hierarchical linear regression modelling to analyse data on 794 tertiary students who undertook a low-stakes engineering mechanics assessment (along with the questionnaire of…
Predictors of Career Adaptability Skill among Higher Education Students in Nigeria
ERIC Educational Resources Information Center
Ebenehi, Amos Shaibu; Rashid, Abdullah Mat; Bakar, Ab Rahim
2016-01-01
This paper examined predictors of career adaptability skill among higher education students in Nigeria. A sample of 603 higher education students randomly selected from six colleges of education in Nigeria participated in this study. A set of self-reported questionnaire was used for data collection, and multiple linear regression analysis was used…
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Nutritional status and weight gain in pregnant women.
Sato, Ana Paula Sayuri; Fujimori, Elizabeth
2012-01-01
This study described the nutritional status of 228 pregnant women and the influence of this on birth weight. This is a retrospective study, developed in a health center in the municipality of São Paulo, with data obtained from medical records. Linear regression analysis was carried out. An association was verified between the initial and final nutritional status (p<0.001). The mean of total weight gain in the pregnant women who began the pregnancy underweight was higher compared those who started overweight/obese (p=0.005). Weight gain was insufficient for 43.4% of the pregnant women with adequate initial weight and for 36.4% of all the pregnant women studied. However, 37.1% of those who began the pregnancy overweight/obese finished with excessive weight gain, a condition that ultimately affected almost a quarter of the pregnant women. Anemia and low birth weight were uncommon, however, in the linear regression analysis, birth weight was associated with weight gain (p<0.05). The study highlights the importance of nutritional care before and during pregnancy to promote maternal-infant health.