Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H
2017-05-10
We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P value
NASA Astrophysics Data System (ADS)
Kang, Pilsang; Koo, Changhoi; Roh, Hokyu
2017-11-01
Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.
2015-07-15
Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-01-01
Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-02-01
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
ERIC Educational Resources Information Center
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
Quality of life in breast cancer patients--a quantile regression analysis.
Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma
2008-01-01
Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Ernst, Anja F; Albers, Casper J
2017-01-01
Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking.
Ernst, Anja F.
2017-01-01
Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking. PMID:28533971
Morse Code, Scrabble, and the Alphabet
ERIC Educational Resources Information Center
Richardson, Mary; Gabrosek, John; Reischman, Diann; Curtiss, Phyliss
2004-01-01
In this paper we describe an interactive activity that illustrates simple linear regression. Students collect data and analyze it using simple linear regression techniques taught in an introductory applied statistics course. The activity is extended to illustrate checks for regression assumptions and regression diagnostics taught in an…
Applications of statistics to medical science, III. Correlation and regression.
Watanabe, Hiroshi
2012-01-01
In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Adding a Parameter Increases the Variance of an Estimated Regression Function
ERIC Educational Resources Information Center
Withers, Christopher S.; Nadarajah, Saralees
2011-01-01
The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
A Constrained Linear Estimator for Multiple Regression
ERIC Educational Resources Information Center
Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.
2010-01-01
"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Yang, Ruiqi; Wang, Fei; Zhang, Jialing; Zhu, Chonglei; Fan, Limei
2015-05-19
To establish the reference values of thalamus, caudate nucleus and lenticular nucleus diameters through fetal thalamic transverse section. A total of 265 fetuses at our hospital were randomly selected from November 2012 to August 2014. And the transverse and length diameters of thalamus, caudate nucleus and lenticular nucleus were measured. SPSS 19.0 statistical software was used to calculate the regression curve of fetal diameter changes and gestational weeks of pregnancy. P < 0.05 was considered as having statistical significance. The linear regression equation of fetal thalamic length diameter and gestational week was: Y = 0.051X+0.201, R = 0.876, linear regression equation of thalamic transverse diameter and fetal gestational week was: Y = 0.031X+0.229, R = 0.817, linear regression equation of fetal head of caudate nucleus length diameter and gestational age was: Y = 0.033X+0.101, R = 0.722, linear regression equation of fetal head of caudate nucleus transverse diameter and gestational week was: R = 0.025 - 0.046, R = 0.711, linear regression equation of fetal lentiform nucleus length diameter and gestational week was: Y = 0.046+0.229, R = 0.765, linear regression equation of fetal lentiform nucleus diameter and gestational week was: Y = 0.025 - 0.05, R = 0.772. Ultrasonic measurement of diameter of fetal thalamus caudate nucleus, and lenticular nucleus through thalamic transverse section is simple and convenient. And measurements increase with fetal gestational weeks and there is linear regression relationship between them.
Control Variate Selection for Multiresponse Simulation.
1987-05-01
M. H. Knuter, Applied Linear Regression Mfodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F., Probability, Allyn and Bacon...1982. Neter, J., V. Wasserman, and M. H. Knuter, Applied Linear Regression .fodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F...Aspects of J%,ultivariate Statistical Theory, John Wiley and Sons, New York, New York, 1982. dY Neter, J., W. Wasserman, and M. H. Knuter, Applied Linear Regression Mfodels
ERIC Educational Resources Information Center
Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael
2011-01-01
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Comparing The Effectiveness of a90/95 Calculations (Preprint)
2006-09-01
Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
Two Paradoxes in Linear Regression Analysis.
Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong
2016-12-25
Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.
ERIC Educational Resources Information Center
Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.
2013-01-01
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
An Analysis of COLA (Cost of Living Adjustment) Allocation within the United States Coast Guard.
1983-09-01
books Applied Linear Regression [Ref. 39], and Statistical Methods in Research and Production [Ref. 40], or any other book on regression. In the event...Indexes, Master’s Thesis, Air Force Institute of Technology, Wright-Patterson AFB, 1976. 39. Weisberg, Stanford, Applied Linear Regression , Wiley, 1980. 40
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Two Paradoxes in Linear Regression Analysis
FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong
2016-01-01
Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
Linear regression analysis: part 14 of a series on evaluation of scientific publications.
Schneider, Astrid; Hommel, Gerhard; Blettner, Maria
2010-11-01
Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
O'Leary, Neil; Chauhan, Balwantray C; Artes, Paul H
2012-10-01
To establish a method for estimating the overall statistical significance of visual field deterioration from an individual patient's data, and to compare its performance to pointwise linear regression. The Truncated Product Method was used to calculate a statistic S that combines evidence of deterioration from individual test locations in the visual field. The overall statistical significance (P value) of visual field deterioration was inferred by comparing S with its permutation distribution, derived from repeated reordering of the visual field series. Permutation of pointwise linear regression (PoPLR) and pointwise linear regression were evaluated in data from patients with glaucoma (944 eyes, median mean deviation -2.9 dB, interquartile range: -6.3, -1.2 dB) followed for more than 4 years (median 10 examinations over 8 years). False-positive rates were estimated from randomly reordered series of this dataset, and hit rates (proportion of eyes with significant deterioration) were estimated from the original series. The false-positive rates of PoPLR were indistinguishable from the corresponding nominal significance levels and were independent of baseline visual field damage and length of follow-up. At P < 0.05, the hit rates of PoPLR were 12, 29, and 42%, at the fifth, eighth, and final examinations, respectively, and at matching specificities they were consistently higher than those of pointwise linear regression. In contrast to population-based progression analyses, PoPLR provides a continuous estimate of statistical significance for visual field deterioration individualized to a particular patient's data. This allows close control over specificity, essential for monitoring patients in clinical practice and in clinical trials.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
ERIC Educational Resources Information Center
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Bennett, Bradley C; Husby, Chad E
2008-03-28
Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Local Linear Regression for Data with AR Errors.
Li, Runze; Li, Yan
2009-07-01
In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.
Conjoint Analysis: A Study of the Effects of Using Person Variables.
ERIC Educational Resources Information Center
Fraas, John W.; Newman, Isadore
Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…
Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method
ERIC Educational Resources Information Center
Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev
2018-01-01
The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…
Common pitfalls in statistical analysis: Linear regression analysis
Aggarwal, Rakesh; Ranganathan, Priya
2017-01-01
In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis. PMID:28447022
Hemmila, April; McGill, Jim; Ritter, David
2008-03-01
To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
An improved multiple linear regression and data analysis computer program package
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Advanced microwave soil moisture studies. [Big Sioux River Basin, Iowa
NASA Technical Reports Server (NTRS)
Dalsted, K. J.; Harlan, J. C.
1983-01-01
Comparisons of low level L-band brightness temperature (TB) and thermal infrared (TIR) data as well as the following data sets: soil map and land cover data; direct soil moisture measurement; and a computer generated contour map were statistically evaluated using regression analysis and linear discriminant analysis. Regression analysis of footprint data shows that statistical groupings of ground variables (soil features and land cover) hold promise for qualitative assessment of soil moisture and for reducing variance within the sampling space. Dry conditions appear to be more conductive to producing meaningful statistics than wet conditions. Regression analysis using field averaged TB and TIR data did not approach the higher sq R values obtained using within-field variations. The linear discriminant analysis indicates some capacity to distinguish categories with the results being somewhat better on a field basis than a footprint basis.
NASA Astrophysics Data System (ADS)
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Calibrated Peer Review for Interpreting Linear Regression Parameters: Results from a Graduate Course
ERIC Educational Resources Information Center
Enders, Felicity B.; Jenkins, Sarah; Hoverman, Verna
2010-01-01
Biostatistics is traditionally a difficult subject for students to learn. While the mathematical aspects are challenging, it can also be demanding for students to learn the exact language to use to correctly interpret statistical results. In particular, correctly interpreting the parameters from linear regression is both a vital tool and a…
ERIC Educational Resources Information Center
Nelson, Dean
2009-01-01
Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…
Enders, Felicity
2013-12-01
Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students' understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health. A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field. Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P < 0.001). Postcourse students had similar results to practicing statisticians (mean (SD) of 20.1(3.5); P = 0.21). Students also showed significant improvement pre/postcourse in each of six domain areas (P < 0.001). The REGRESS quiz was internally reliable (Cronbach's alpha 0.89). The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions. © 2013 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
1980-01-01
MATHPAC image-analysis library is collection of general-purpose mathematical and statistical routines and special-purpose data-analysis and pattern-recognition routines for image analysis. MATHPAC library consists of Linear Algebra, Optimization, Statistical-Summary, Densities and Distribution, Regression, and Statistical-Test packages.
Automating approximate Bayesian computation by local linear regression.
Thornton, Kevin R
2009-07-07
In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.
Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.
Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S
2017-06-01
The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas
2015-01-01
Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600
Jupiter, Daniel C
2012-01-01
In this first of a series of statistical methodology commentaries for the clinician, we discuss the use of multivariate linear regression. Copyright © 2012 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Functional Relationships and Regression Analysis.
ERIC Educational Resources Information Center
Preece, Peter F. W.
1978-01-01
Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
Estimating monotonic rates from biological data using local linear regression.
Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R
2017-03-01
Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.
Linear regression models and k-means clustering for statistical analysis of fNIRS data.
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-02-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.
Linear regression models and k-means clustering for statistical analysis of fNIRS data
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-01-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets. PMID:25780751
ERIC Educational Resources Information Center
Liou, Pey-Yan
2009-01-01
The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…
Guan, Yongtao; Li, Yehua; Sinha, Rajita
2011-01-01
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
Regression Commonality Analysis: A Technique for Quantitative Theory Building
ERIC Educational Resources Information Center
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
Pease, J M; Morselli, M F
1987-01-01
This paper deals with a computer program adapted to a statistical method for analyzing an unlimited quantity of binary recorded data of an independent circular variable (e.g. wind direction), and a linear variable (e.g. maple sap flow volume). Circular variables cannot be statistically analyzed with linear methods, unless they have been transformed. The program calculates a critical quantity, the acrophase angle (PHI, phi o). The technique is adapted from original mathematics [1] and is written in Fortran 77 for easier conversion between computer networks. Correlation analysis can be performed following the program or regression which, because of the circular nature of the independent variable, becomes periodic regression. The technique was tested on a file of approximately 4050 data pairs.
Agier, Lydiane; Portengen, Lützen; Chadeau-Hyam, Marc; Basagaña, Xavier; Giorgis-Allemand, Lise; Siroux, Valérie; Robinson, Oliver; Vlaanderen, Jelle; González, Juan R; Nieuwenhuijsen, Mark J; Vineis, Paolo; Vrijheid, Martine; Slama, Rémy; Vermeulen, Roel
2016-12-01
The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures. We compared the performances of linear regression-based statistical methods in assessing exposome-health associations. In a simulation study, we generated 237 exposure covariates with a realistic correlation structure and with a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity. On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and an FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm revealed a sensitivity of 81% and an FDP of 34%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%) despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates. Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study were limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods. Citation: Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen MJ, Vineis P, Vrijheid M, Slama R, Vermeulen R. 2016. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect 124:1848-1856; http://dx.doi.org/10.1289/EHP172.
Lunt, Mark
2015-07-01
In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F
2018-06-01
This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.
Advanced Statistics for Exotic Animal Practitioners.
Hodsoll, John; Hellier, Jennifer M; Ryan, Elizabeth G
2017-09-01
Correlation and regression assess the association between 2 or more variables. This article reviews the core knowledge needed to understand these analyses, moving from visual analysis in scatter plots through correlation, simple and multiple linear regression, and logistic regression. Correlation estimates the strength and direction of a relationship between 2 variables. Regression can be considered more general and quantifies the numerical relationships between an outcome and 1 or multiple variables in terms of a best-fit line, allowing predictions to be made. Each technique is discussed with examples and the statistical assumptions underlying their correct application. Copyright © 2017 Elsevier Inc. All rights reserved.
Handling nonnormality and variance heterogeneity for quantitative sublethal toxicity tests.
Ritz, Christian; Van der Vliet, Leana
2009-09-01
The advantages of using regression-based techniques to derive endpoints from environmental toxicity data are clear, and slowly, this superior analytical technique is gaining acceptance. As use of regression-based analysis becomes more widespread, some of the associated nuances and potential problems come into sharper focus. Looking at data sets that cover a broad spectrum of standard test species, we noticed that some model fits to data failed to meet two key assumptions-variance homogeneity and normality-that are necessary for correct statistical analysis via regression-based techniques. Failure to meet these assumptions often is caused by reduced variance at the concentrations showing severe adverse effects. Although commonly used with linear regression analysis, transformation of the response variable only is not appropriate when fitting data using nonlinear regression techniques. Through analysis of sample data sets, including Lemna minor, Eisenia andrei (terrestrial earthworm), and algae, we show that both the so-called Box-Cox transformation and use of the Poisson distribution can help to correct variance heterogeneity and nonnormality and so allow nonlinear regression analysis to be implemented. Both the Box-Cox transformation and the Poisson distribution can be readily implemented into existing protocols for statistical analysis. By correcting for nonnormality and variance heterogeneity, these two statistical tools can be used to encourage the transition to regression-based analysis and the depreciation of less-desirable and less-flexible analytical techniques, such as linear interpolation.
Li, Zhenghua; Cheng, Fansheng; Xia, Zhining
2011-01-01
The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.
Detecting trends in raptor counts: power and type I error rates of various statistical tests
Hatfield, J.S.; Gould, W.R.; Hoover, B.A.; Fuller, M.R.; Lindquist, E.L.
1996-01-01
We conducted simulations that estimated power and type I error rates of statistical tests for detecting trends in raptor population count data collected from a single monitoring site. Results of the simulations were used to help analyze count data of bald eagles (Haliaeetus leucocephalus) from 7 national forests in Michigan, Minnesota, and Wisconsin during 1980-1989. Seven statistical tests were evaluated, including simple linear regression on the log scale and linear regression with a permutation test. Using 1,000 replications each, we simulated n = 10 and n = 50 years of count data and trends ranging from -5 to 5% change/year. We evaluated the tests at 3 critical levels (alpha = 0.01, 0.05, and 0.10) for both upper- and lower-tailed tests. Exponential count data were simulated by adding sampling error with a coefficient of variation of 40% from either a log-normal or autocorrelated log-normal distribution. Not surprisingly, tests performed with 50 years of data were much more powerful than tests with 10 years of data. Positive autocorrelation inflated alpha-levels upward from their nominal levels, making the tests less conservative and more likely to reject the null hypothesis of no trend. Of the tests studied, Cox and Stuart's test and Pollard's test clearly had lower power than the others. Surprisingly, the linear regression t-test, Collins' linear regression permutation test, and the nonparametric Lehmann's and Mann's tests all had similar power in our simulations. Analyses of the count data suggested that bald eagles had increasing trends on at least 2 of the 7 national forests during 1980-1989.
Quantile regression for the statistical analysis of immunological data with many non-detects.
Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth
2012-07-07
Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.
High correlations between MRI brain volume measurements based on NeuroQuant® and FreeSurfer.
Ross, David E; Ochs, Alfred L; Tate, David F; Tokac, Umit; Seabaugh, John; Abildskov, Tracy J; Bigler, Erin D
2018-05-30
NeuroQuant ® (NQ) and FreeSurfer (FS) are commonly used computer-automated programs for measuring MRI brain volume. Previously they were reported to have high intermethod reliabilities but often large intermethod effect size differences. We hypothesized that linear transformations could be used to reduce the large effect sizes. This study was an extension of our previously reported study. We performed NQ and FS brain volume measurements on 60 subjects (including normal controls, patients with traumatic brain injury, and patients with Alzheimer's disease). We used two statistical approaches in parallel to develop methods for transforming FS volumes into NQ volumes: traditional linear regression, and Bayesian linear regression. For both methods, we used regression analyses to develop linear transformations of the FS volumes to make them more similar to the NQ volumes. The FS-to-NQ transformations based on traditional linear regression resulted in effect sizes which were small to moderate. The transformations based on Bayesian linear regression resulted in all effect sizes being trivially small. To our knowledge, this is the first report describing a method for transforming FS to NQ data so as to achieve high reliability and low effect size differences. Machine learning methods like Bayesian regression may be more useful than traditional methods. Copyright © 2018 Elsevier B.V. All rights reserved.
Tuuli, Methodius G; Odibo, Anthony O
2011-08-01
The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.
Transfer Student Success: Educationally Purposeful Activities Predictive of Undergraduate GPA
ERIC Educational Resources Information Center
Fauria, Renee M.; Fuller, Matthew B.
2015-01-01
Researchers evaluated the effects of Educationally Purposeful Activities (EPAs) on transfer and nontransfer students' cumulative GPAs. Hierarchical, linear, and multiple regression models yielded seven statistically significant educationally purposeful items that influenced undergraduate student GPAs. Statistically significant positive EPAs for…
Statistical downscaling of precipitation using long short-term memory recurrent neural networks
NASA Astrophysics Data System (ADS)
Misra, Saptarshi; Sarkar, Sudeshna; Mitra, Pabitra
2017-11-01
Hydrological impacts of global climate change on regional scale are generally assessed by downscaling large-scale climatic variables, simulated by General Circulation Models (GCMs), to regional, small-scale hydrometeorological variables like precipitation, temperature, etc. In this study, we propose a new statistical downscaling model based on Recurrent Neural Network with Long Short-Term Memory which captures the spatio-temporal dependencies in local rainfall. The previous studies have used several other methods such as linear regression, quantile regression, kernel regression, beta regression, and artificial neural networks. Deep neural networks and recurrent neural networks have been shown to be highly promising in modeling complex and highly non-linear relationships between input and output variables in different domains and hence we investigated their performance in the task of statistical downscaling. We have tested this model on two datasets—one on precipitation in Mahanadi basin in India and the second on precipitation in Campbell River basin in Canada. Our autoencoder coupled long short-term memory recurrent neural network model performs the best compared to other existing methods on both the datasets with respect to temporal cross-correlation, mean squared error, and capturing the extremes.
Assessing risk factors for periodontitis using regression
NASA Astrophysics Data System (ADS)
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
2013-10-01
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi
2007-10-01
Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.
Liu, Yan; Salvendy, Gavriel
2009-05-01
This paper aims to demonstrate the effects of measurement errors on psychometric measurements in ergonomics studies. A variety of sources can cause random measurement errors in ergonomics studies and these errors can distort virtually every statistic computed and lead investigators to erroneous conclusions. The effects of measurement errors on five most widely used statistical analysis tools have been discussed and illustrated: correlation; ANOVA; linear regression; factor analysis; linear discriminant analysis. It has been shown that measurement errors can greatly attenuate correlations between variables, reduce statistical power of ANOVA, distort (overestimate, underestimate or even change the sign of) regression coefficients, underrate the explanation contributions of the most important factors in factor analysis and depreciate the significance of discriminant function and discrimination abilities of individual variables in discrimination analysis. The discussions will be restricted to subjective scales and survey methods and their reliability estimates. Other methods applied in ergonomics research, such as physical and electrophysiological measurements and chemical and biomedical analysis methods, also have issues of measurement errors, but they are beyond the scope of this paper. As there has been increasing interest in the development and testing of theories in ergonomics research, it has become very important for ergonomics researchers to understand the effects of measurement errors on their experiment results, which the authors believe is very critical to research progress in theory development and cumulative knowledge in the ergonomics field.
Baqué, Michèle; Amendt, Jens
2013-01-01
Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.
Statistical Tutorial | Center for Cancer Research
Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. ST is designed as a follow up to Statistical Analysis of Research Data (SARD) held in April 2018. The tutorial will apply the general principles of statistical analysis of research data including descriptive statistics, z- and t-tests of means and mean differences, simple and multiple linear regression, ANOVA tests, and Chi-Squared distribution.
NASA Technical Reports Server (NTRS)
Wilson, R. M.; Reichmann, E. J.; Teuber, D. L.
1984-01-01
An empirical method is developed to predict certain parameters of future solar activity cycles. Sunspot cycle statistics are examined, and curve fitting and linear regression analysis techniques are utilized.
NASA Technical Reports Server (NTRS)
Roth, D. J.; Swickard, S. M.; Stang, D. B.; Deguire, M. R.
1991-01-01
A review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials is presented. Initially, a semiempirical model is developed showing the origin of the linear relationship between ultrasonic velocity and porosity fraction. Then, from a compilation of data produced by many researchers, scatter plots of velocity versus percent porosity data are shown for Al2O3, MgO, porcelain-based ceramics, PZT, SiC, Si3N4, steel, tungsten, UO2,(U0.30Pu0.70)C, and YBa2Cu3O(7-x). Linear regression analysis produces predicted slope, intercept, correlation coefficient, level of significance, and confidence interval statistics for the data. Velocity values predicted from regression analysis of fully-dense materials are in good agreement with those calculated from elastic properties.
NASA Technical Reports Server (NTRS)
Roth, D. J.; Swickard, S. M.; Stang, D. B.; Deguire, M. R.
1990-01-01
A review and statistical analysis of the ultrasonic velocity method for estimating the porosity fraction in polycrystalline materials is presented. Initially, a semi-empirical model is developed showing the origin of the linear relationship between ultrasonic velocity and porosity fraction. Then, from a compilation of data produced by many researchers, scatter plots of velocity versus percent porosity data are shown for Al2O3, MgO, porcelain-based ceramics, PZT, SiC, Si3N4, steel, tungsten, UO2,(U0.30Pu0.70)C, and YBa2Cu3O(7-x). Linear regression analysis produced predicted slope, intercept, correlation coefficient, level of significance, and confidence interval statistics for the data. Velocity values predicted from regression analysis for fully-dense materials are in good agreement with those calculated from elastic properties.
USDA-ARS?s Scientific Manuscript database
Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
Distributed Monitoring of the R(sup 2) Statistic for Linear Regression
NASA Technical Reports Server (NTRS)
Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.
2011-01-01
The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
ASURV: Astronomical SURVival Statistics
NASA Astrophysics Data System (ADS)
Feigelson, E. D.; Nelson, P. I.; Isobe, T.; LaValley, M.
2014-06-01
ASURV (Astronomical SURVival Statistics) provides astronomy survival analysis for right- and left-censored data including the maximum-likelihood Kaplan-Meier estimator and several univariate two-sample tests, bivariate correlation measures, and linear regressions. ASURV is written in FORTRAN 77, and is stand-alone and does not call any specialized libraries.
Genetic Programming Transforms in Linear Regression Situations
NASA Astrophysics Data System (ADS)
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Comparative Research of Navy Voluntary Education at Operational Commands
2017-03-01
return on investment, ROI, logistic regression, multivariate analysis, descriptive statistics, Markov, time-series, linear programming 15. NUMBER...21 B. DESCRIPTIVE STATISTICS TABLES ...............................................25 C. PRIVACY CONSIDERATIONS...THIS PAGE INTENTIONALLY LEFT BLANK xi LIST OF TABLES Table 1. Variables and Descriptions . Adapted from NETC (2016). .......................21
ERIC Educational Resources Information Center
Everson, Howard T.; And Others
This paper explores the feasibility of neural computing methods such as artificial neural networks (ANNs) and abductory induction mechanisms (AIM) for use in educational measurement. ANNs and AIMS methods are contrasted with more traditional statistical techniques, such as multiple regression and discriminant function analyses, for making…
Statistical Package User’s Guide.
1980-08-01
261 C. STACH Nonparametric Descriptive Statistics ... ......... ... 265 D. CHIRA Coefficient of Concordance...135 I.- -a - - W 7- Test Data: This program was tested using data from John Neter and William Wasserman, Applied Linear Statistical Models: Regression...length of data file e. new fileý name (not same as raw data file) 5. Printout as optioned for only. Comments: Ranked data are used for program CHIRA
ERIC Educational Resources Information Center
Baker, Bruce D.; Richards, Craig E.
1999-01-01
Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
Minimizing bias in biomass allometry: Model selection and log transformation of data
Joseph Mascaro; undefined undefined; Flint Hughes; Amanda Uowolo; Stefan A. Schnitzer
2011-01-01
Nonlinear regression is increasingly used to develop allometric equations for forest biomass estimation (i.e., as opposed to the raditional approach of log-transformation followed by linear regression). Most statistical software packages, however, assume additive errors by default, violating a key assumption of allometric theory and possibly producing spurious models....
Modeling Longitudinal Data Containing Non-Normal Within Subject Errors
NASA Technical Reports Server (NTRS)
Feiveson, Alan; Glenn, Nancy L.
2013-01-01
The mission of the National Aeronautics and Space Administration’s (NASA) human research program is to advance safe human spaceflight. This involves conducting experiments, collecting data, and analyzing data. The data are longitudinal and result from a relatively few number of subjects; typically 10 – 20. A longitudinal study refers to an investigation where participant outcomes and possibly treatments are collected at multiple follow-up times. Standard statistical designs such as mean regression with random effects and mixed–effects regression are inadequate for such data because the population is typically not approximately normally distributed. Hence, more advanced data analysis methods are necessary. This research focuses on four such methods for longitudinal data analysis: the recently proposed linear quantile mixed models (lqmm) by Geraci and Bottai (2013), quantile regression, multilevel mixed–effects linear regression, and robust regression. This research also provides computational algorithms for longitudinal data that scientists can directly use for human spaceflight and other longitudinal data applications, then presents statistical evidence that verifies which method is best for specific situations. This advances the study of longitudinal data in a broad range of applications including applications in the sciences, technology, engineering and mathematics fields.
Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert
2012-01-01
Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
Application of linear regression analysis in accuracy assessment of rolling force calculations
NASA Astrophysics Data System (ADS)
Poliak, E. I.; Shim, M. K.; Kim, G. S.; Choo, W. Y.
1998-10-01
Efficient operation of the computational models employed in process control systems require periodical assessment of the accuracy of their predictions. Linear regression is proposed as a tool which allows separate systematic and random prediction errors from those related to measurements. A quantitative characteristic of the model predictive ability is introduced in addition to standard statistical tests for model adequacy. Rolling force calculations are considered as an example for the application. However, the outlined approach can be used to assess the performance of any computational model.
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Estimating effects of limiting factors with regression quantiles
Cade, B.S.; Terrell, J.W.; Schroeder, R.L.
1999-01-01
In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e.g., 90-95th) were greater than if effects were estimated by changes in the means from standard linear model procedures. Estimating a range of regression quantiles (e.g., 5-95th) provides a comprehensive description of biological response patterns for exploratory and inferential analyses in observational studies of limiting factors, especially when sampling large spatial and temporal scales.
Lamm, Steven H; Ferdosi, Hamid; Dissen, Elisabeth K; Li, Ji; Ahn, Jaeil
2015-12-07
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1-1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100-150 µg/L arsenic.
Guisan, Antoine; Edwards, T.C.; Hastie, T.
2002-01-01
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.
Lamm, Steven H.; Ferdosi, Hamid; Dissen, Elisabeth K.; Li, Ji; Ahn, Jaeil
2015-01-01
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1–1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100–150 µg/L arsenic. PMID:26690190
NASA Astrophysics Data System (ADS)
Sirenko, M. A.; Tarasenko, P. F.; Pushkarev, M. I.
2017-01-01
One of the most noticeable features of sign-based statistical procedures is an opportunity to build an exact test for simple hypothesis testing of parameters in a regression model. In this article, we expanded a sing-based approach to the nonlinear case with dependent noise. The examined model is a multi-quantile regression, which makes it possible to test hypothesis not only of regression parameters, but of noise parameters as well.
Detecting influential observations in nonlinear regression modeling of groundwater flow
Yager, Richard M.
1998-01-01
Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
A New Sample Size Formula for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.; Barcikowski, Robert S.
The focus of this research was to determine the efficacy of a new method of selecting sample sizes for multiple linear regression. A Monte Carlo simulation was used to study both empirical predictive power rates and empirical statistical power rates of the new method and seven other methods: those of C. N. Park and A. L. Dudycha (1974); J. Cohen…
Scoring and staging systems using cox linear regression modeling and recursive partitioning.
Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H
2006-01-01
Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
NASA Astrophysics Data System (ADS)
Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.
2014-09-01
Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.
NASA Astrophysics Data System (ADS)
Shan, X.; Zhang, K.; Zhuang, Y.; Fu, R.; Hong, Y.
2017-12-01
Seasonal prediction of rainfall during the dry-to-wet transition season in austral spring (September-November) over southern Amazonia is central for improving planting crops and fire mitigation in that region. Previous studies have identified the key large-scale atmospheric dynamic and thermodynamics pre-conditions during the dry season (June-August) that influence the rainfall anomalies during the dry to wet transition season over Southern Amazonia. Based on these key pre-conditions during dry season, we have evaluated several statistical models and developed a Neural Network based statistical prediction system to predict rainfall during the dry to wet transition for Southern Amazonia (5-15°S, 50-70°W). Multivariate Empirical Orthogonal Function (EOF) Analysis is applied to the following four fields during JJA from the ECMWF Reanalysis (ERA-Interim) spanning from year 1979 to 2015: geopotential height at 200 hPa, surface relative humidity, convective inhibition energy (CIN) index and convective available potential energy (CAPE), to filter out noise and highlight the most coherent spatial and temporal variations. The first 10 EOF modes are retained for inputs to the statistical models, accounting for at least 70% of the total variance in the predictor fields. We have tested several linear and non-linear statistical methods. While the regularized Ridge Regression and Lasso Regression can generally capture the spatial pattern and magnitude of rainfall anomalies, we found that that Neural Network performs best with an accuracy greater than 80%, as expected from the non-linear dependence of the rainfall on the large-scale atmospheric thermodynamic conditions and circulation. Further tests of various prediction skill metrics and hindcasts also suggest this Neural Network prediction approach can significantly improve seasonal prediction skill than the dynamic predictions and regression based statistical predictions. Thus, this statistical prediction system could have shown potential to improve real-time seasonal rainfall predictions in the future.
The Relationship Between Surface Curvature and Abdominal Aortic Aneurysm Wall Stress.
de Galarreta, Sergio Ruiz; Cazón, Aitor; Antón, Raúl; Finol, Ender A
2017-08-01
The maximum diameter (MD) criterion is the most important factor when predicting risk of rupture of abdominal aortic aneurysms (AAAs). An elevated wall stress has also been linked to a high risk of aneurysm rupture, yet is an uncommon clinical practice to compute AAA wall stress. The purpose of this study is to assess whether other characteristics of the AAA geometry are statistically correlated with wall stress. Using in-house segmentation and meshing algorithms, 30 patient-specific AAA models were generated for finite element analysis (FEA). These models were subsequently used to estimate wall stress and maximum diameter and to evaluate the spatial distributions of wall thickness, cross-sectional diameter, mean curvature, and Gaussian curvature. Data analysis consisted of statistical correlations of the aforementioned geometry metrics with wall stress for the 30 AAA inner and outer wall surfaces. In addition, a linear regression analysis was performed with all the AAA wall surfaces to quantify the relationship of the geometric indices with wall stress. These analyses indicated that while all the geometry metrics have statistically significant correlations with wall stress, the local mean curvature (LMC) exhibits the highest average Pearson's correlation coefficient for both inner and outer wall surfaces. The linear regression analysis revealed coefficients of determination for the outer and inner wall surfaces of 0.712 and 0.516, respectively, with LMC having the largest effect on the linear regression equation with wall stress. This work underscores the importance of evaluating AAA mean wall curvature as a potential surrogate for wall stress.
Thieler, E. Robert; Himmelstoss, Emily A.; Zichichi, Jessica L.; Ergul, Ayhan
2009-01-01
The Digital Shoreline Analysis System (DSAS) version 4.0 is a software extension to ESRI ArcGIS v.9.2 and above that enables a user to calculate shoreline rate-of-change statistics from multiple historic shoreline positions. A user-friendly interface of simple buttons and menus guides the user through the major steps of shoreline change analysis. Components of the extension and user guide include (1) instruction on the proper way to define a reference baseline for measurements, (2) automated and manual generation of measurement transects and metadata based on user-specified parameters, and (3) output of calculated rates of shoreline change and other statistical information. DSAS computes shoreline rates of change using four different methods: (1) endpoint rate, (2) simple linear regression, (3) weighted linear regression, and (4) least median of squares. The standard error, correlation coefficient, and confidence interval are also computed for the simple and weighted linear-regression methods. The results of all rate calculations are output to a table that can be linked to the transect file by a common attribute field. DSAS is intended to facilitate the shoreline change-calculation process and to provide rate-of-change information and the statistical data necessary to establish the reliability of the calculated results. The software is also suitable for any generic application that calculates positional change over time, such as assessing rates of change of glacier limits in sequential aerial photos, river edge boundaries, land-cover changes, and so on.
Cost Estimation of Naval Ship Acquisition.
1983-12-01
one a 9-sub- system model , the other a single total cost model . The models were developed using the linear least squares regression tech- nique with...to Linear Statistical Models , McGraw-Hill, 1961. 11. Helmer, F. T., Bibliography on Pricing Methodology and Cost Estimating, Dept. of Economics and...SUPPI.EMSaTARY NOTES IS. KWRo" (Cowaft. en tever aide of ..aesep M’ Idab~t 6 Week ONNa.) Cost estimation; Acquisition; Parametric cost estimate; linear
A new statistical approach to climate change detection and attribution
NASA Astrophysics Data System (ADS)
Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe
2017-01-01
We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).
Linear models: permutation methods
Cade, B.S.; Everitt, B.S.; Howell, D.C.
2005-01-01
Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...
NASA Astrophysics Data System (ADS)
Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.
2008-04-01
Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin
Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less
An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology
Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin; ...
2017-05-15
Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less
NASA Astrophysics Data System (ADS)
Hoffman, A.; Forest, C. E.; Kemanian, A.
2016-12-01
A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452
Drivers willingness to pay progressive rate for street parking.
DOT National Transportation Integrated Search
2015-01-01
This study finds willingness to pay and price elasticity for on-street parking demand using stated : preference data obtained from 238 respondents. Descriptive, statistical and economic analyses including : regression, generalized linear model, and f...
Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.
2009-01-01
In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.
Huppert, Theodore J
2016-01-01
Functional near-infrared spectroscopy (fNIRS) is a noninvasive neuroimaging technique that uses low levels of light to measure changes in cerebral blood oxygenation levels. In the majority of NIRS functional brain studies, analysis of this data is based on a statistical comparison of hemodynamic levels between a baseline and task or between multiple task conditions by means of a linear regression model: the so-called general linear model. Although these methods are similar to their implementation in other fields, particularly for functional magnetic resonance imaging, the specific application of these methods in fNIRS research differs in several key ways related to the sources of noise and artifacts unique to fNIRS. In this brief communication, we discuss the application of linear regression models in fNIRS and the modifications needed to generalize these models in order to deal with structured (colored) noise due to systemic physiology and noise heteroscedasticity due to motion artifacts. The objective of this work is to present an overview of these noise properties in the context of the linear model as it applies to fNIRS data. This work is aimed at explaining these mathematical issues to the general fNIRS experimental researcher but is not intended to be a complete mathematical treatment of these concepts.
NASA Astrophysics Data System (ADS)
Lucifredi, A.; Mazzieri, C.; Rossi, M.
2000-05-01
Since the operational conditions of a hydroelectric unit can vary within a wide range, the monitoring system must be able to distinguish between the variations of the monitored variable caused by variations of the operation conditions and those due to arising and progressing of failures and misoperations. The paper aims to identify the best technique to be adopted for the monitoring system. Three different methods have been implemented and compared. Two of them use statistical techniques: the first, the linear multiple regression, expresses the monitored variable as a linear function of the process parameters (independent variables), while the second, the dynamic kriging technique, is a modified technique of multiple linear regression representing the monitored variable as a linear combination of the process variables in such a way as to minimize the variance of the estimate error. The third is based on neural networks. Tests have shown that the monitoring system based on the kriging technique is not affected by some problems common to the other two models e.g. the requirement of a large amount of data for their tuning, both for training the neural network and defining the optimum plane for the multiple regression, not only in the system starting phase but also after a trivial operation of maintenance involving the substitution of machinery components having a direct impact on the observed variable. Or, in addition, the necessity of different models to describe in a satisfactory way the different ranges of operation of the plant. The monitoring system based on the kriging statistical technique overrides the previous difficulties: it does not require a large amount of data to be tuned and is immediately operational: given two points, the third can be immediately estimated; in addition the model follows the system without adapting itself to it. The results of the experimentation performed seem to indicate that a model based on a neural network or on a linear multiple regression is not optimal, and that a different approach is necessary to reduce the amount of work during the learning phase using, when available, all the information stored during the initial phase of the plant to build the reference baseline, elaborating, if it is the case, the raw information available. A mixed approach using the kriging statistical technique and neural network techniques could optimise the result.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
NASA Astrophysics Data System (ADS)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
NASA Technical Reports Server (NTRS)
Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.
1998-01-01
The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.
Laurens, L M L; Wolfrum, E J
2013-12-18
One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.
Smith, S. Jerrod; Lewis, Jason M.; Graves, Grant M.
2015-09-28
Generalized-least-squares multiple-linear regression analysis was used to formulate regression relations between peak-streamflow frequency statistics and basin characteristics. Contributing drainage area was the only basin characteristic determined to be statistically significant for all percentage of annual exceedance probabilities and was the only basin characteristic used in regional regression equations for estimating peak-streamflow frequency statistics on unregulated streams in and near the Oklahoma Panhandle. The regression model pseudo-coefficient of determination, converted to percent, for the Oklahoma Panhandle regional regression equations ranged from about 38 to 63 percent. The standard errors of prediction and the standard model errors for the Oklahoma Panhandle regional regression equations ranged from about 84 to 148 percent and from about 76 to 138 percent, respectively. These errors were comparable to those reported for regional peak-streamflow frequency regression equations for the High Plains areas of Texas and Colorado. The root mean square errors for the Oklahoma Panhandle regional regression equations (ranging from 3,170 to 92,000 cubic feet per second) were less than the root mean square errors for the Oklahoma statewide regression equations (ranging from 18,900 to 412,000 cubic feet per second); therefore, the Oklahoma Panhandle regional regression equations produce more accurate peak-streamflow statistic estimates for the irrigated period of record in the Oklahoma Panhandle than do the Oklahoma statewide regression equations. The regression equations developed in this report are applicable to streams that are not substantially affected by regulation, impoundment, or surface-water withdrawals. These regression equations are intended for use for stream sites with contributing drainage areas less than or equal to about 2,060 square miles, the maximum value for the independent variable used in the regression analysis.
Pattern Recognition Analysis of Age-Related Retinal Ganglion Cell Signatures in the Human Eye
Yoshioka, Nayuta; Zangerl, Barbara; Nivison-Smith, Lisa; Khuu, Sieu K.; Jones, Bryan W.; Pfeiffer, Rebecca L.; Marc, Robert E.; Kalloniatis, Michael
2017-01-01
Purpose To characterize macular ganglion cell layer (GCL) changes with age and provide a framework to assess changes in ocular disease. This study used data clustering to analyze macular GCL patterns from optical coherence tomography (OCT) in a large cohort of subjects without ocular disease. Methods Single eyes of 201 patients evaluated at the Centre for Eye Health (Sydney, Australia) were retrospectively enrolled (age range, 20–85); 8 × 8 grid locations obtained from Spectralis OCT macular scans were analyzed with unsupervised classification into statistically separable classes sharing common GCL thickness and change with age. The resulting classes and gridwise data were fitted with linear and segmented linear regression curves. Additionally, normalized data were analyzed to determine regression as a percentage. Accuracy of each model was examined through comparison of predicted 50-year-old equivalent macular GCL thickness for the entire cohort to a true 50-year-old reference cohort. Results Pattern recognition clustered GCL thickness across the macula into five to eight spatially concentric classes. F-test demonstrated segmented linear regression to be the most appropriate model for macular GCL change. The pattern recognition–derived and normalized model revealed less difference between the predicted macular GCL thickness and the reference cohort (average ± SD 0.19 ± 0.92 and −0.30 ± 0.61 μm) than a gridwise model (average ± SD 0.62 ± 1.43 μm). Conclusions Pattern recognition successfully identified statistically separable macular areas that undergo a segmented linear reduction with age. This regression model better predicted macular GCL thickness. The various unique spatial patterns revealed by pattern recognition combined with core GCL thickness data provide a framework to analyze GCL loss in ocular disease. PMID:28632847
DOE Office of Scientific and Technical Information (OSTI.GOV)
Callister, Stephen J.; Barry, Richard C.; Adkins, Joshua N.
2006-02-01
Central tendency, linear regression, locally weighted regression, and quantile techniques were investigated for normalization of peptide abundance measurements obtained from high-throughput liquid chromatography-Fourier transform ion cyclotron resonance mass spectrometry (LC-FTICR MS). Arbitrary abundances of peptides were obtained from three sample sets, including a standard protein sample, two Deinococcus radiodurans samples taken from different growth phases, and two mouse striatum samples from control and methamphetamine-stressed mice (strain C57BL/6). The selected normalization techniques were evaluated in both the absence and presence of biological variability by estimating extraneous variability prior to and following normalization. Prior to normalization, replicate runs from each sample setmore » were observed to be statistically different, while following normalization replicate runs were no longer statistically different. Although all techniques reduced systematic bias, assigned ranks among the techniques revealed significant trends. For most LC-FTICR MS analyses, linear regression normalization ranked either first or second among the four techniques, suggesting that this technique was more generally suitable for reducing systematic biases.« less
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit
Chu, Annie; Cui, Jenny; Dinov, Ivo D.
2011-01-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
Reliability Analysis of the Gradual Degradation of Semiconductor Devices.
1983-07-20
under the heading of linear models or linear statistical models . 3 ,4 We have not used this material in this report. Assuming catastrophic failure when...assuming a catastrophic model . In this treatment we first modify our system loss formula and then proceed to the actual analysis. II. ANALYSIS OF...Failure Time 1 Ti Ti 2 T2 T2 n Tn n and are easily analyzed by simple linear regression. Since we have assumed a log normal/Arrhenius activation
The Influence Factor Model for the Popularity of Mobile Phone without Considering the Price Factor
NASA Astrophysics Data System (ADS)
Long, Hongming; Peng, Diefei; Wu, Hailin; Yang, Zihui
2018-01-01
Based on the statistical data like economic development, social development, population indicator and so on, this paper establishes the linear regression model which influences the popularity rate of mobile phone users.
Huang, Jian; Zhang, Cun-Hui
2013-01-01
The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
2016-05-01
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Multiple regression for physiological data analysis: the problem of multicollinearity.
Slinker, B K; Glantz, S A
1985-07-01
Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
Normality of raw data in general linear models: The most widespread myth in statistics
Kery, Marc; Hatfield, Jeff S.
2003-01-01
In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.
Ergonomics study on mobile phones for thumb physiology discomfort
NASA Astrophysics Data System (ADS)
Bendero, J. M. S.; Doon, M. E. R.; Quiogue, K. C. A.; Soneja, L. C.; Ong, N. R.; Sauli, Z.; Vairavan, R.
2017-09-01
The study was conducted on Filipino undergraduate college students and aimed to find out about the significant factors associated with mobile phone usage and its effect on thumb pain.A correlation-prediction analysisand Multiple Linear Regression was adopted and used as the main tool in determining the significant factors and coming up with predictive models on thumb related pain. With the use of the software Statistical Package for the Social Sciences or SPSS in conducting linear regression, 2 significant factors on thumb-related pain (percentage of time using portrait as screen orientation when text messaging, amount of time playing games using one hand in a day) were found.
Applications of Support Vector Machines In Chemo And Bioinformatics
NASA Astrophysics Data System (ADS)
Jayaraman, V. K.; Sundararajan, V.
2010-10-01
Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.
An open-access CMIP5 pattern library for temperature and precipitation: description and methodology
NASA Astrophysics Data System (ADS)
Lynch, Cary; Hartin, Corinne; Bond-Lamberty, Ben; Kravitz, Ben
2017-05-01
Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squares regression methods. We explore the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90° N/S). Bias and mean errors between modeled and pattern-predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5 °C, but the choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. This paper describes our library of least squares regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns. The dataset and netCDF data generation code are available at doi:10.5281/zenodo.495632.
Marrero-Ponce, Yovani; Medina-Marrero, Ricardo; Castillo-Garit, Juan A; Romero-Zaldivar, Vicente; Torrens, Francisco; Castro, Eduardo A
2005-04-15
A novel approach to bio-macromolecular design from a linear algebra point of view is introduced. A protein's total (whole protein) and local (one or more amino acid) linear indices are a new set of bio-macromolecular descriptors of relevance to protein QSAR/QSPR studies. These amino-acid level biochemical descriptors are based on the calculation of linear maps on Rn[f k(xmi):Rn-->Rn] in canonical basis. These bio-macromolecular indices are calculated from the kth power of the macromolecular pseudograph alpha-carbon atom adjacency matrix. Total linear indices are linear functional on Rn. That is, the kth total linear indices are linear maps from Rn to the scalar R[f k(xm):Rn-->R]. Thus, the kth total linear indices are calculated by summing the amino-acid linear indices of all amino acids in the protein molecule. A study of the protein stability effects for a complete set of alanine substitutions in the Arc repressor illustrates this approach. A quantitative model that discriminates near wild-type stability alanine mutants from the reduced-stability ones in a training series was obtained. This model permitted the correct classification of 97.56% (40/41) and 91.67% (11/12) of proteins in the training and test set, respectively. It shows a high Matthews correlation coefficient (MCC=0.952) for the training set and an MCC=0.837 for the external prediction set. Additionally, canonical regression analysis corroborated the statistical quality of the classification model (Rcanc=0.824). This analysis was also used to compute biological stability canonical scores for each Arc alanine mutant. On the other hand, the linear piecewise regression model compared favorably with respect to the linear regression one on predicting the melting temperature (tm) of the Arc alanine mutants. The linear model explains almost 81% of the variance of the experimental tm (R=0.90 and s=4.29) and the LOO press statistics evidenced its predictive ability (q2=0.72 and scv=4.79). Moreover, the TOMOCOMD-CAMPS method produced a linear piecewise regression (R=0.97) between protein backbone descriptors and tm values for alanine mutants of the Arc repressor. A break-point value of 51.87 degrees C characterized two mutant clusters and coincided perfectly with the experimental scale. For this reason, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutant Arc homodimers. These models also permitted the interpretation of the driving forces of such folding process, indicating that topologic/topographic protein backbone interactions control the stability profile of wild-type Arc and its alanine mutants.
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
Potential pitfalls when denoising resting state fMRI data using nuisance regression.
Bright, Molly G; Tench, Christopher R; Murphy, Kevin
2017-07-01
In resting state fMRI, it is necessary to remove signal variance associated with noise sources, leaving cleaned fMRI time-series that more accurately reflect the underlying intrinsic brain fluctuations of interest. This is commonly achieved through nuisance regression, in which the fit is calculated of a noise model of head motion and physiological processes to the fMRI data in a General Linear Model, and the "cleaned" residuals of this fit are used in further analysis. We examine the statistical assumptions and requirements of the General Linear Model, and whether these are met during nuisance regression of resting state fMRI data. Using toy examples and real data we show how pre-whitening, temporal filtering and temporal shifting of regressors impact model fit. Based on our own observations, existing literature, and statistical theory, we make the following recommendations when employing nuisance regression: pre-whitening should be applied to achieve valid statistical inference of the noise model fit parameters; temporal filtering should be incorporated into the noise model to best account for changes in degrees of freedom; temporal shifting of regressors, although merited, should be achieved via optimisation and validation of a single temporal shift. We encourage all readers to make simple, practical changes to their fMRI denoising pipeline, and to regularly assess the appropriateness of the noise model used. By negotiating the potential pitfalls described in this paper, and by clearly reporting the details of nuisance regression in future manuscripts, we hope that the field will achieve more accurate and precise noise models for cleaning the resting state fMRI time-series. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Optimizing methods for linking cinematic features to fMRI data.
Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia
2015-04-15
One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
Pope, Larry M.; Diaz, A.M.
1982-01-01
Quality-of-water data, collected October 21-23, 1980, and a statistical summary are presented for 42 coal-mined strip pits in Crawford and Cherokee Counties, Southeastern Kansas. The statistical summary includes minimum and maximum observed values , mean, and standard deviation. Simple linear regression equations relating specific conductance, dissolved solids, and acidity to concentrations of dissolved solids, sulfate, calcium, and magnesium, potassium, aluminum, and iron are also presented. (USGS)
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Bebbington, Emily; Furniss, Dominic
2015-02-01
We integrated two factors, demographic population shifts and changes in prevalence of disease, to predict future trends in demand for hand surgery in England, to facilitate workforce planning. We analysed Hospital Episode Statistics data for Dupuytren's disease, carpal tunnel syndrome, cubital tunnel syndrome, and trigger finger from 1998 to 2011. Using linear regression, we estimated trends in both diagnosis and surgery until 2030. We integrated this regression with age specific population data from the Office for National Statistics in order to estimate how this will contribute to a change in workload over time. There has been a significant increase in both absolute numbers of diagnoses and surgery for all four conditions. Combined with future population data, we calculate that the total operative burden for these four conditions will increase from 87,582 in 2011 to 170,166 (95% confidence interval 144,517-195,353) in 2030. The prevalence of these diseases in the ageing population, and increasing prevalence of predisposing factors such as obesity and diabetes, may account for the predicted increase in workload. The most cost effective treatments must be sought, which requires high quality clinical trials. Our methodology can be applied to other sub-specialties to help anticipate the need for future service provision. Copyright © 2014 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Hester, Yvette
Least squares methods are sophisticated mathematical curve fitting procedures used in all classical parametric methods. The linear least squares approximation is most often associated with finding the "line of best fit" or the regression line. Since all statistical analyses are correlational and all classical parametric methods are least…
DEVELOPMENT OF THE VIRTUAL BEACH MODEL, PHASE 1: AN EMPIRICAL MODEL
With increasing attention focused on the use of multiple linear regression (MLR) modeling of beach fecal bacteria concentration, the validity of the entire statistical process should be carefully evaluated to assure satisfactory predictions. This work aims to identify pitfalls an...
Mathematics Readiness of First-Year University Students
ERIC Educational Resources Information Center
Atuahene, Francis; Russell, Tammy A.
2016-01-01
The majority of high school students, particularly underrepresented minorities (URMs) from low socioeconomic backgrounds are graduating from high school less prepared academically for advanced-level college mathematics. Using 2009 and 2010 course enrollment data, several statistical analyses (multiple linear regression, Cochran Mantel Haenszel…
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
NASA Astrophysics Data System (ADS)
Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno
2017-03-01
This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four
SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *
Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.
2014-01-01
The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844
Validating Variational Bayes Linear Regression Method With Multi-Central Datasets.
Murata, Hiroshi; Zangwill, Linda M; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Hirasawa, Kazunori; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Shoji, Nobuyuki; Asaoka, Ryo
2018-04-01
To validate the prediction accuracy of variational Bayes linear regression (VBLR) with two datasets external to the training dataset. The training dataset consisted of 7268 eyes of 4278 subjects from the University of Tokyo Hospital. The Japanese Archive of Multicentral Databases in Glaucoma (JAMDIG) dataset consisted of 271 eyes of 177 patients, and the Diagnostic Innovations in Glaucoma Study (DIGS) dataset includes 248 eyes of 173 patients, which were used for validation. Prediction accuracy was compared between the VBLR and ordinary least squared linear regression (OLSLR). First, OLSLR and VBLR were carried out using total deviation (TD) values at each of the 52 test points from the second to fourth visual fields (VFs) (VF2-4) to 2nd to 10th VF (VF2-10) of each patient in JAMDIG and DIGS datasets, and the TD values of the 11th VF test were predicted every time. The predictive accuracy of each method was compared through the root mean squared error (RMSE) statistic. OLSLR RMSEs with the JAMDIG and DIGS datasets were between 31 and 4.3 dB, and between 19.5 and 3.9 dB. On the other hand, VBLR RMSEs with JAMDIG and DIGS datasets were between 5.0 and 3.7, and between 4.6 and 3.6 dB. There was statistically significant difference between VBLR and OLSLR for both datasets at every series (VF2-4 to VF2-10) (P < 0.01 for all tests). However, there was no statistically significant difference in VBLR RMSEs between JAMDIG and DIGS datasets at any series of VFs (VF2-2 to VF2-10) (P > 0.05). VBLR outperformed OLSLR to predict future VF progression, and the VBLR has a potential to be a helpful tool at clinical settings.
NASA Astrophysics Data System (ADS)
Shi, Jinfei; Zhu, Songqing; Chen, Ruwen
2017-12-01
An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Efficacy of Social Media Adoption on Client Growth for Independent Management Consultants
2017-02-01
design , a linear multiple regression with three predictor variables and one dependent variable per testing were used. Under those circumstances...regression test was used to compare the social media adoption of two groups on a single measure to determine if there was a statistical difference...number and types of social media platforms used and their influence on client growth was examined in this research design that used a descriptive
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Harris, Michael; Radtke, Arthur S.
1976-01-01
Linear regression and discriminant analyses techniques were applied to gold, mercury, arsenic, antimony, barium, copper, molybdenum, lead, zinc, boron, tellurium, selenium, and tungsten analyses from drill holes into unoxidized gold ore at the Carlin gold mine near Carlin, Nev. The statistical treatments employed were used to judge proposed hypotheses on the origin and geochemical paragenesis of this disseminated gold deposit.
Coronary artery calcium distributions in older persons in the AGES-Reykjavik study
Gudmundsson, Elias Freyr; Gudnason, Vilmundur; Sigurdsson, Sigurdur; Launer, Lenore J.; Harris, Tamara B.; Aspelund, Thor
2013-01-01
Coronary Artery Calcium (CAC) is a sign of advanced atherosclerosis and an independent risk factor for cardiac events. Here, we describe CAC-distributions in an unselected aged population and compare modelling methods to characterize CAC-distribution. CAC is difficult to model because it has a skewed and zero inflated distribution with over-dispersion. Data are from the AGES-Reykjavik sample, a large population based study [2002-2006] in Iceland of 5,764 persons aged 66-96 years. Linear regressions using logarithmic- and Box-Cox transformations on CAC+1, quantile regression and a Zero-Inflated Negative Binomial model (ZINB) were applied. Methods were compared visually and with the PRESS-statistic, R2 and number of detected associations with concurrently measured variables. There were pronounced differences in CAC according to sex, age, history of coronary events and presence of plaque in the carotid artery. Associations with conventional coronary artery disease (CAD) risk factors varied between the sexes. The ZINB model provided the best results with respect to the PRESS-statistic, R2, and predicted proportion of zero scores. The ZINB model detected similar numbers of associations as the linear regression on ln(CAC+1) and usually with the same risk factors. PMID:22990371
A Refined Model for Radar Homing Intercepts.
1983-10-27
Helge Toutenburq, Prior Information in Linear Models ,(Wiley, NY, 1982). 7. F. A. Graybill , Introduction to Matrices with Applications in StatisticF... linear target trajectory model z i = 0 + 1 r i + wi () where w i i=I,..., N is a sequence of uncorrelated zero-mean A noise, the general formula for...z i (i=l,..., N) at r. and a linear regression model 1 z i = a0 + a1 r i + w i =(Al) where wi is the corruption noise; the problem is to estimate a0
Quantile regression models of animal habitat relationships
Cade, Brian S.
2003-01-01
Typically, all factors that limit an organism are not measured and included in statistical models used to investigate relationships with their environment. If important unmeasured variables interact multiplicatively with the measured variables, the statistical models often will have heterogeneous response distributions with unequal variances. Quantile regression is an approach for estimating the conditional quantiles of a response variable distribution in the linear model, providing a more complete view of possible causal relationships between variables in ecological processes. Chapter 1 introduces quantile regression and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of estimates for homogeneous and heterogeneous regression models. Chapter 2 evaluates performance of quantile rankscore tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). A permutation F test maintained better Type I errors than the Chi-square T test for models with smaller n, greater number of parameters p, and more extreme quantiles τ. Both versions of the test required weighting to maintain correct Type I errors when there was heterogeneity under the alternative model. An example application related trout densities to stream channel width:depth. Chapter 3 evaluates a drop in dispersion, F-ratio like permutation test for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). Chapter 4 simulates from a large (N = 10,000) finite population representing grid areas on a landscape to demonstrate various forms of hidden bias that might occur when the effect of a measured habitat variable on some animal was confounded with the effect of another unmeasured variable (spatially and not spatially structured). Depending on whether interactions of the measured habitat and unmeasured variable were negative (interference interactions) or positive (facilitation interactions), either upper (τ > 0.5) or lower (τ < 0.5) quantile regression parameters were less biased than mean rate parameters. Sampling (n = 20 - 300) simulations demonstrated that confidence intervals constructed by inverting rankscore tests provided valid coverage of these biased parameters. Quantile regression was used to estimate effects of physical habitat resources on a bivalve mussel (Macomona liliana) in a New Zealand harbor by modeling the spatial trend surface as a cubic polynomial of location coordinates.
Muradian, Kh K; Utko, N O; Mozzhukhina, T H; Pishel', I M; Litoshenko, O Ia; Bezrukov, V V; Fraĭfel'd, V E
2002-01-01
Correlative and regressive relations between the gaseous exchange, thermoregulation and mitochondrial protein content were analyzed by two- and three-dimensional statistics in mice. It has been shown that the pair wise linear methods of analysis did not reveal any significant correlation between the parameters under exploration. However, it became evident at three-dimensional and non-linear plotting for which the coefficients of multivariable correlation reached and even exceeded 0.7-0.8. The calculations based on partial differentiation of the multivariable regression equations allow to conclude that at certain values of VO2, VCO2 and body temperature negative relations between the systems of gaseous exchange and thermoregulation become dominating.
NASA Astrophysics Data System (ADS)
Bourke, Sarah A.; Hermann, Kristian J.; Hendry, M. Jim
2017-11-01
Elevated groundwater salinity associated with produced water, leaching from landfills or secondary salinity can degrade arable soils and potable water resources. Direct-push electrical conductivity (EC) profiling enables rapid, relatively inexpensive, high-resolution in-situ measurements of subsurface salinity, without requiring core collection or installation of groundwater wells. However, because the direct-push tool measures the bulk EC of both solid and liquid phases (ECa), incorporation of ECa data into regional or historical groundwater data sets requires the prediction of pore water EC (ECw) or chloride (Cl-) concentrations from measured ECa. Statistical linear regression and physically based models for predicting ECw and Cl- from ECa profiles were tested on a brine plume in central Saskatchewan, Canada. A linear relationship between ECa/ECw and porosity was more accurate for predicting ECw and Cl- concentrations than a power-law relationship (Archie's Law). Despite clay contents of up to 96%, the addition of terms to account for electrical conductance in the solid phase did not improve model predictions. In the absence of porosity data, statistical linear regression models adequately predicted ECw and Cl- concentrations from direct-push ECa profiles (ECw = 5.48 ECa + 0.78, R 2 = 0.87; Cl- = 1,978 ECa - 1,398, R 2 = 0.73). These statistical models can be used to predict ECw in the absence of lithologic data and will be particularly useful for initial site assessments. The more accurate linear physically based model can be used to predict ECw and Cl- as porosity data become available and the site-specific ECw-Cl- relationship is determined.
Analytic Methods for Adjusting Subjective Rating Schemes.
ERIC Educational Resources Information Center
Cooper, Richard V. L.; Nelson, Gary R.
Statistical and econometric techniques of correcting for supervisor bias in models of individual performance appraisal were developed, using a variant of the classical linear regression model. Location bias occurs when individual performance is systematically overestimated or underestimated, while scale bias results when raters either exaggerate…
Trend Analysis Using Microcomputers.
ERIC Educational Resources Information Center
Berger, Carl F.
A trend analysis statistical package and additional programs for the Apple microcomputer are presented. They illustrate strategies of data analysis suitable to the graphics and processing capabilities of the microcomputer. The programs analyze data sets using examples of: (1) analysis of variance with multiple linear regression; (2) exponential…
NASA Astrophysics Data System (ADS)
Hirst, Jonathan D.; King, Ross D.; Sternberg, Michael J. E.
1994-08-01
Neural networks and inductive logic programming (ILP) have been compared to linear regression for modelling the QSAR of the inhibition of E. coli dihydrofolate reductase (DHFR) by 2,4-diamino-5-(substitured benzyl)pyrimidines, and, in the subsequent paper [Hirst, J.D., King, R.D. and Sternberg, M.J.E., J. Comput.-Aided Mol. Design, 8 (1994) 421], the inhibition of rodent DHFR by 2,4-diamino-6,6-dimethyl-5-phenyl-dihydrotriazines. Cross-validation trials provide a statistically rigorous assessment of the predictive capabilities of the methods, with training and testing data selected randomly and all the methods developed using identical training data. For the ILP analysis, molecules are represented by attributes other than Hansch parameters. Neural networks and ILP perform better than linear regression using the attribute representation, but the difference is not statistically significant. The major benefit from the ILP analysis is the formulation of understandable rules relating the activity of the inhibitors to their chemical structure.
Analysis and generation of groundwater concentration time series
NASA Astrophysics Data System (ADS)
Crăciun, Maria; Vamoş, Călin; Suciu, Nicolae
2018-01-01
Concentration time series are provided by simulated concentrations of a nonreactive solute transported in groundwater, integrated over the transverse direction of a two-dimensional computational domain and recorded at the plume center of mass. The analysis of a statistical ensemble of time series reveals subtle features that are not captured by the first two moments which characterize the approximate Gaussian distribution of the two-dimensional concentration fields. The concentration time series exhibit a complex preasymptotic behavior driven by a nonstationary trend and correlated fluctuations with time-variable amplitude. Time series with almost the same statistics are generated by successively adding to a time-dependent trend a sum of linear regression terms, accounting for correlations between fluctuations around the trend and their increments in time, and terms of an amplitude modulated autoregressive noise of order one with time-varying parameter. The algorithm generalizes mixing models used in probability density function approaches. The well-known interaction by exchange with the mean mixing model is a special case consisting of a linear regression with constant coefficients.
The Influential Effect of Blending, Bump, Changing Period, and Eclipsing Cepheids on the Leavitt Law
NASA Astrophysics Data System (ADS)
García-Varela, A.; Muñoz, J. R.; Sabogal, B. E.; Vargas Domínguez, S.; Martínez, J.
2016-06-01
The investigation of the nonlinearity of the Leavitt law (LL) is a topic that began more than seven decades ago, when some of the studies in this field found that the LL has a break at about 10 days. The goal of this work is to investigate a possible statistical cause of this nonlinearity. By applying linear regressions to OGLE-II and OGLE-IV data, we find that to obtain the LL by using linear regression, robust techniques to deal with influential points and/or outliers are needed instead of the ordinary least-squares regression traditionally used. In particular, by using M- and MM-regressions we establish firmly and without doubt the linearity of the LL in the Large Magellanic Cloud, without rejecting or excluding Cepheid data from the analysis. This implies that light curves of Cepheids suggesting blending, bumps, eclipses, or period changes do not affect the LL for this galaxy. For the Small Magellanic Cloud, when including Cepheids of this kind, it is not possible to find an adequate model, probably because of the geometry of the galaxy. In that case, a possible influence of these stars could exist.
Multiple regression technique for Pth degree polynominals with and without linear cross products
NASA Technical Reports Server (NTRS)
Davis, J. W.
1973-01-01
A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
NASA Astrophysics Data System (ADS)
Gonçalves, Karen dos Santos; Winkler, Mirko S.; Benchimol-Barbosa, Paulo Roberto; de Hoogh, Kees; Artaxo, Paulo Eduardo; de Souza Hacon, Sandra; Schindler, Christian; Künzli, Nino
2018-07-01
Epidemiological studies generally use particulate matter measurements with diameter less 2.5 μm (PM2.5) from monitoring networks. Satellite aerosol optical depth (AOD) data has considerable potential in predicting PM2.5 concentrations, and thus provides an alternative method for producing knowledge regarding the level of pollution and its health impact in areas where no ground PM2.5 measurements are available. This is the case in the Brazilian Amazon rainforest region where forest fires are frequent sources of high pollution. In this study, we applied a non-linear model for predicting PM2.5 concentration from AOD retrievals using interaction terms between average temperature, relative humidity, sine, cosine of date in a period of 365,25 days and the square of the lagged relative residual. Regression performance statistics were tested comparing the goodness of fit and R2 based on results from linear regression and non-linear regression for six different models. The regression results for non-linear prediction showed the best performance, explaining on average 82% of the daily PM2.5 concentrations when considering the whole period studied. In the context of Amazonia, it was the first study predicting PM2.5 concentrations using the latest high-resolution AOD products also in combination with the testing of a non-linear model performance. Our results permitted a reliable prediction considering the AOD-PM2.5 relationship and set the basis for further investigations on air pollution impacts in the complex context of Brazilian Amazon Region.
León, Larry F; Cai, Tianxi
2012-04-01
In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of "robust" residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as "robust" censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Statistical approach to the analysis of olive long-term pollen season trends in southern Spain.
García-Mozo, H; Yaezel, L; Oteros, J; Galán, C
2014-03-01
Analysis of long-term airborne pollen counts makes it possible not only to chart pollen-season trends but also to track changing patterns in flowering phenology. Changes in higher plant response over a long interval are considered among the most valuable bioindicators of climate change impact. Phenological-trend models can also provide information regarding crop production and pollen-allergen emission. The interest of this information makes essential the election of the statistical analysis for time series study. We analysed trends and variations in the olive flowering season over a 30-year period (1982-2011) in southern Europe (Córdoba, Spain), focussing on: annual Pollen Index (PI); Pollen Season Start (PSS), Peak Date (PD), Pollen Season End (PSE) and Pollen Season Duration (PSD). Apart from the traditional Linear Regression analysis, a Seasonal-Trend Decomposition procedure based on Loess (STL) and an ARIMA model were performed. Linear regression results indicated a trend toward delayed PSE and earlier PSS and PD, probably influenced by the rise in temperature. These changes are provoking longer flowering periods in the study area. The use of the STL technique provided a clearer picture of phenological behaviour. Data decomposition on pollination dynamics enabled the trend toward an alternate bearing cycle to be distinguished from the influence of other stochastic fluctuations. Results pointed to show a rising trend in pollen production. With a view toward forecasting future phenological trends, ARIMA models were constructed to predict PSD, PSS and PI until 2016. Projections displayed a better goodness of fit than those derived from linear regression. Findings suggest that olive reproductive cycle is changing considerably over the last 30years due to climate change. Further conclusions are that STL improves the effectiveness of traditional linear regression in trend analysis, and ARIMA models can provide reliable trend projections for future years taking into account the internal fluctuations in time series. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
Predicting Success in Psychological Statistics Courses.
Lester, David
2016-06-01
Many students perform poorly in courses on psychological statistics, and it is useful to be able to predict which students will have difficulties. In a study of 93 undergraduates enrolled in Statistical Methods (18 men, 75 women; M age = 22.0 years, SD = 5.1), performance was significantly associated with sex (female students performed better) and proficiency in algebra in a linear regression analysis. Anxiety about statistics was not associated with course performance, indicating that basic mathematical skills are the best correlate for performance in statistics courses and can usefully be used to stream students into classes by ability. © The Author(s) 2016.
Crawford, John R; Garthwaite, Paul H; Denham, Annie K; Chelune, Gordon J
2012-12-01
Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because (a) not all psychologists are aware that regression equations can be built not only from raw data but also using only basic summary data for a sample, and (b) the computations involved are tedious and prone to error. In an attempt to overcome these barriers, Crawford and Garthwaite (2007) provided methods to build and apply simple linear regression models using summary statistics as data. In the present study, we extend this work to set out the steps required to build multiple regression models from sample summary statistics and the further steps required to compute the associated statistics for drawing inferences concerning an individual case. We also develop, describe, and make available a computer program that implements these methods. Although there are caveats associated with the use of the methods, these need to be balanced against pragmatic considerations and against the alternative of either entirely ignoring a pertinent data set or using it informally to provide a clinical "guesstimate." Upgraded versions of earlier programs for regression in the single case are also provided; these add the point and interval estimates of effect size developed in the present article.
Shimizu, Yu; Yoshimoto, Junichiro; Takamura, Masahiro; Okada, Go; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS) regression to resting-state functional magnetic resonance imaging (rs-fMRI) data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area. PMID:28700672
Supervised Learning for Dynamical System Learning.
Hefny, Ahmed; Downey, Carlton; Gordon, Geoffrey J
2015-01-01
Recently there has been substantial interest in spectral methods for learning dynamical systems. These methods are popular since they often offer a good tradeoff between computational and statistical efficiency. Unfortunately, they can be difficult to use and extend in practice: e.g., they can make it difficult to incorporate prior information such as sparsity or structure. To address this problem, we present a new view of dynamical system learning: we show how to learn dynamical systems by solving a sequence of ordinary supervised learning problems, thereby allowing users to incorporate prior knowledge via standard techniques such as L 1 regularization. Many existing spectral methods are special cases of this new framework, using linear regression as the supervised learner. We demonstrate the effectiveness of our framework by showing examples where nonlinear regression or lasso let us learn better state representations than plain linear regression does; the correctness of these instances follows directly from our general analysis.
Tools for Basic Statistical Analysis
NASA Technical Reports Server (NTRS)
Luz, Paul L.
2005-01-01
Statistical Analysis Toolset is a collection of eight Microsoft Excel spreadsheet programs, each of which performs calculations pertaining to an aspect of statistical analysis. These programs present input and output data in user-friendly, menu-driven formats, with automatic execution. The following types of calculations are performed: Descriptive statistics are computed for a set of data x(i) (i = 1, 2, 3 . . . ) entered by the user. Normal Distribution Estimates will calculate the statistical value that corresponds to cumulative probability values, given a sample mean and standard deviation of the normal distribution. Normal Distribution from two Data Points will extend and generate a cumulative normal distribution for the user, given two data points and their associated probability values. Two programs perform two-way analysis of variance (ANOVA) with no replication or generalized ANOVA for two factors with four levels and three repetitions. Linear Regression-ANOVA will curvefit data to the linear equation y=f(x) and will do an ANOVA to check its significance.
NASA Astrophysics Data System (ADS)
Abunama, Taher; Othman, Faridah
2017-06-01
Analysing the fluctuations of wastewater inflow rates in sewage treatment plants (STPs) is essential to guarantee a sufficient treatment of wastewater before discharging it to the environment. The main objectives of this study are to statistically analyze and forecast the wastewater inflow rates into the Bandar Tun Razak STP in Kuala Lumpur, Malaysia. A time series analysis of three years’ weekly influent data (156weeks) has been conducted using the Auto-Regressive Integrated Moving Average (ARIMA) model. Various combinations of ARIMA orders (p, d, q) have been tried to select the most fitted model, which was utilized to forecast the wastewater inflow rates. The linear regression analysis was applied to testify the correlation between the observed and predicted influents. ARIMA (3, 1, 3) model was selected with the highest significance R-square and lowest normalized Bayesian Information Criterion (BIC) value, and accordingly the wastewater inflow rates were forecasted to additional 52weeks. The linear regression analysis between the observed and predicted values of the wastewater inflow rates showed a positive linear correlation with a coefficient of 0.831.
Analysis of reciprocal creatinine plots by two-phase linear regression.
Rowe, P A; Richardson, R E; Burton, P R; Morgan, A G; Burden, R P
1989-01-01
The progression of renal diseases is often monitored by the serial measurement of plasma creatinine. The slope of the linear relation that is frequently found between the reciprocal of creatinine concentration and time delineates the rate of change in renal function. Minor changes in slope, perhaps indicating response to therapeutic intervention, can be difficult to identify and yet be of clinical importance. We describe the application of two-phase linear regression to identify and characterise changes in slope using a microcomputer. The method fits two intersecting lines to the data by computing a least-squares estimate of the position of the slope change and its 95% confidence limits. This avoids the potential bias of fixing the change at a preconceived time corresponding with an alteration in treatment. The program then evaluates the statistical and clinical significance of the slope change and produces a graphical output to aid interpretation.
Key Results of Interaction Models with Centering
ERIC Educational Resources Information Center
Afshartous, David; Preston, Richard A.
2011-01-01
We consider the effect on estimation of simultaneous variable centering and interaction effects in linear regression. We technically define, review, and amplify many of the statistical issues for interaction models with centering in order to create a useful and compact reference for teachers, students, and applied researchers. In addition, we…
Social Inequality and Labor Force Participation.
ERIC Educational Resources Information Center
King, Jonathan
The labor force participation rates of whites, blacks, and Spanish-Americans, grouped by sex, are explained in a linear regression model fitted with 1970 U. S. Census data on Standard Metropolitan Statistical Area (SMSA). The explanatory variables are: average age, average years of education, vocational training rate, disabled rate, unemployment…
Statistical considerations in the analysis of data from replicated bioassays
USDA-ARS?s Scientific Manuscript database
Multiple-dose bioassay is generally the preferred method for characterizing virulence of insect pathogens. Linear regression of probit mortality on log dose enables estimation of LD50/LC50 and slope, the latter having substantial effect on LD90/95s (doses of considerable interest in pest management)...
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 40 Protection of Environment 20 2013-07-01 2013-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
40 CFR 86.1341-90 - Test cycle validation criteria.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 40 Protection of Environment 20 2012-07-01 2012-07-01 false Test cycle validation criteria. 86... Procedures § 86.1341-90 Test cycle validation criteria. (a) To minimize the biasing effect of the time lag... brake horsepower-hour. (c) Regression line analysis to calculate validation statistics. (1) Linear...
Advanced statistical methods for improved data analysis of NASA astrophysics missions
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.
1992-01-01
The investigators under this grant studied ways to improve the statistical analysis of astronomical data. They looked at existing techniques, the development of new techniques, and the production and distribution of specialized software to the astronomical community. Abstracts of nine papers that were produced are included, as well as brief descriptions of four software packages. The articles that are abstracted discuss analytical and Monte Carlo comparisons of six different linear least squares fits, a (second) paper on linear regression in astronomy, two reviews of public domain software for the astronomer, subsample and half-sample methods for estimating sampling distributions, a nonparametric estimation of survival functions under dependent competing risks, censoring in astronomical data due to nondetections, an astronomy survival analysis computer package called ASURV, and improving the statistical methodology of astronomical data analysis.
Arambasić, M B; Jatić-Slavković, D
2004-05-01
This paper presents the application of the regression analysis program and the program for comparing linear regressions (modified method for one-way, analysis of variance), writtens in BASIC program language, for instance, determination of content of Diclofenac-Sodium (active ingredient in DIKLOFEN injections, ampules á 75 mg/3 ml). Stability testing of Diclofenac-Sodium was done by isothermic method of accelerated aging at 4 different temperatures (30 degrees, 40 degrees, 50 degrees and 60 degrees C) as a function of time (4 different duration of treatment: (0-155, 0-145, 0-74 and 0-44 days). The decrease in stability (decrease in the mean value of the content of Diclofenac-Sodium (in %), at different temperatures as a function of time, is possible to describe by, linear dependance. According to the value for regression equation values, the times are assessed in which the content of Diclofenac-Sodium (in %) will decrease by 10%, of the initial value. The times are follows at 30 degrees C 761.02 days, at 40 degrees C 397.26 days, at 50 degrees C 201.96 days and at 60 degrees C 58.85 days. The estimated times (in days) in which the mean value for Diclofenac-Sodium content (in %) will by 10% of the initial values, as a junction of time, are most suitably described by 3rd order parabola. Based on the parameter values which describe the 3rd order parabola, the time was estimated in which Diclofenac-Sodium content mean value (in %) will fall by 10% of the initial one at average ambient temperatures of 20 degrees C and 25 degrees C. The times are: 1409.47 days (20 degrees C) and 1042.39 days (25 degrees C). Based on the value for Fischer's coefficien (F), the comparison of trenf of Diclofenac-Sodium content (in %) shows that, under the influence of different temperatures as a function of time, among them, depending on temperature value, there is: statistically very significant difference (P < .05) at 50 degrees C and lower toward 60 degrees C, i.e. statistically probably significant difference (P > 0.01) at 40 degrees C and lower towards 50 degrees C and there is no statistically significance difference (P > 0.05) at 30 degrees C towards 40 degrees C.
Notes on power of normality tests of error terms in regression models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Střelec, Luboš
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
NASA Astrophysics Data System (ADS)
Kiss, I.; Cioată, V. G.; Ratiu, S. A.; Rackov, M.; Penčić, M.
2018-01-01
Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. This article focuses on expressing the multiple linear regression model related to the hardness assurance by the chemical composition of the phosphorous cast irons destined to the brake shoes, having in view that the regression coefficients will illustrate the unrelated contributions of each independent variable towards predicting the dependent variable. In order to settle the multiple correlations between the hardness of the cast-iron brake shoes, and their chemical compositions several regression equations has been proposed. Is searched a mathematical solution which can determine the optimum chemical composition for the hardness desirable values. Starting from the above-mentioned affirmations two new statistical experiments are effectuated related to the values of Phosphorus [P], Manganese [Mn] and Silicon [Si]. Therefore, the regression equations, which describe the mathematical dependency between the above-mentioned elements and the hardness, are determined. As result, several correlation charts will be revealed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-09-14
This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.
THE DISTRIBUTION OF COOK’S D STATISTIC
Muller, Keith E.; Mok, Mario Chen
2013-01-01
Cook (1977) proposed a diagnostic to quantify the impact of deleting an observation on the estimated regression coefficients of a General Linear Univariate Model (GLUM). Simulations of models with Gaussian response and predictors demonstrate that his suggestion of comparing the diagnostic to the median of the F for overall regression captures an erratically varying proportion of the values. We describe the exact distribution of Cook’s statistic for a GLUM with Gaussian predictors and response. We also present computational forms, simple approximations, and asymptotic results. A simulation supports the accuracy of the results. The methods allow accurate evaluation of a single value or the maximum value from a regression analysis. The approximations work well for a single value, but less well for the maximum. In contrast, the cut-point suggested by Cook provides widely varying tail probabilities. As with all diagnostics, the data analyst must use scientific judgment in deciding how to treat highlighted observations. PMID:24363487
NASA Astrophysics Data System (ADS)
Xie, Yanan; Zhou, Mingliang; Pan, Dengke
2017-10-01
The forward-scattering model is introduced to describe the response of normalized radar cross section (NRCS) of precipitation with synthetic aperture radar (SAR). Since the distribution of near-surface rainfall is related to the rate of near-surface rainfall and horizontal distribution factor, a retrieval algorithm called modified regression empirical and model-oriented statistical (M-M) based on the volterra integration theory is proposed. Compared with the model-oriented statistical and volterra integration (MOSVI) algorithm, the biggest difference is that the M-M algorithm is based on the modified regression empirical algorithm rather than the linear regression formula to retrieve the value of near-surface rainfall rate. Half of the empirical parameters are reduced in the weighted integral work and a smaller average relative error is received while the rainfall rate is less than 100 mm/h. Therefore, the algorithm proposed in this paper can obtain high-precision rainfall information.
Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M
In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.
NASA Technical Reports Server (NTRS)
Gausman, H. W.; Everitt, J. H.; Escobar, D. E. (Principal Investigator)
1982-01-01
Annual variations in the nitrogen-chlorophyll leaf reflectance of hackberry, honey mesquite and live oak in south Texas, were compared. In spring, leaf reflectance at the 0.55 m wavelength and nitrogen (N) concentration was high but leaf chlorophyll (chl) concentrations were low. In summer, leaf reflectance and N-concentration were low but lead chl concentrations were high. Linear correlations for both spring and summer of leaf reflectance with N and chl concentration or deviations from linear regression were not statistically significant.
Wartberg, L; Kriston, L; Kramer, M; Schwedler, A; Lincoln, T M; Kammerl, R
2017-06-01
Internet gaming disorder (IGD) has been included in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5). Currently, associations between IGD in early adolescence and mental health are largely unexplained. In the present study, the relation of IGD with adolescent and parental mental health was investigated for the first time. We surveyed 1095 family dyads (an adolescent aged 12-14 years and a related parent) with a standardized questionnaire for IGD as well as for adolescent and parental mental health. We conducted linear (dimensional approach) and logistic (categorical approach) regression analyses. Both with dimensional and categorical approaches, we observed statistically significant associations between IGD and male gender, a higher degree of adolescent antisocial behavior, anger control problems, emotional distress, self-esteem problems, hyperactivity/inattention and parental anxiety (linear regression model: corrected R 2 =0.41, logistic regression model: Nagelkerke's R 2 =0.41). IGD appears to be associated with internalizing and externalizing problems in adolescents. Moreover, the findings of the present study provide first evidence that not only adolescent but also parental mental health is relevant to IGD in early adolescence. Adolescent and parental mental health should be considered in prevention and intervention programs for IGD in adolescence. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Granato, Gregory E.
2006-01-01
The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.
Extending local canonical correlation analysis to handle general linear contrasts for FMRI data.
Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar
2012-01-01
Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic.
Extending Local Canonical Correlation Analysis to Handle General Linear Contrasts for fMRI Data
Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar
2012-01-01
Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic. PMID:22461786
Nonlinear multivariate and time series analysis by neural network methods
NASA Astrophysics Data System (ADS)
Hsieh, William W.
2004-03-01
Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method.
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2015-11-18
Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2016-01-01
Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889
Online Statistical Modeling (Regression Analysis) for Independent Responses
NASA Astrophysics Data System (ADS)
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Comparison of Adaline and Multiple Linear Regression Methods for Rainfall Forecasting
NASA Astrophysics Data System (ADS)
Sutawinaya, IP; Astawa, INGA; Hariyanti, NKD
2018-01-01
Heavy rainfall can cause disaster, therefore need a forecast to predict rainfall intensity. Main factor that cause flooding is there is a high rainfall intensity and it makes the river become overcapacity. This will cause flooding around the area. Rainfall factor is a dynamic factor, so rainfall is very interesting to be studied. In order to support the rainfall forecasting, there are methods that can be used from Artificial Intelligence (AI) to statistic. In this research, we used Adaline for AI method and Regression for statistic method. The more accurate forecast result shows the method that used is good for forecasting the rainfall. Through those methods, we expected which is the best method for rainfall forecasting here.
Time Advice and Learning Questions in Computer Simulations
ERIC Educational Resources Information Center
Rey, Gunter Daniel
2011-01-01
Students (N = 101) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without time advice) x 3 (with learning questions and corrective feedback, with…
ERIC Educational Resources Information Center
Fulmer, Gavin W.; Polikoff, Morgan S.
2014-01-01
An essential component in school accountability efforts is for assessments to be well-aligned with the standards or curriculum they are intended to measure. However, relatively little prior research has explored methods to determine statistical significance of alignment or misalignment. This study explores analyses of alignment as a special case…
Comparing and Contrasting Neural Net Solutions to Classical Statistical Solutions.
ERIC Educational Resources Information Center
Van Nelson, C.; Neff, Kathryn J.
Data from two studies in which subjects were classified as successful or unsuccessful were analyzed using neural net technology after being analyzed with a linear regression function. Data were obtained from admission records of 201 students admitted to undergraduate and 285 students admitted to graduate programs. Data included grade point…
Modeling Success: Using Preenrollment Data to Identify Academically At-Risk Students
ERIC Educational Resources Information Center
Gansemer-Topf, Ann M.; Compton, Jonathan; Wohlgemuth, Darin; Forbes, Greg; Ralston, Ekaterina
2015-01-01
Improving student success and degree completion is one of the core principles of strategic enrollment management. To address this principle, institutional data were used to develop a statistical model to identify academically at-risk students. The model employs multiple linear regression techniques to predict students at risk of earning below a…
Post-Modeling Histogram Matching of Maps Produced Using Regression Trees
Andrew J. Lister; Tonya W. Lister
2006-01-01
Spatial predictive models often use statistical techniques that in some way rely on averaging of values. Estimates from linear modeling are known to be susceptible to truncation of variance when the independent (predictor) variables are measured with error. A straightforward post-processing technique (histogram matching) for attempting to mitigate this effect is...
ERIC Educational Resources Information Center
Dubnjakovic, Ana
2012-01-01
The current study investigates factors influencing increase in reference transactions in a typical week in academic libraries across the United States of America. Employing multiple regression analysis and general linear modeling, variables of interest from the "Academic Library Survey (ALS) 2006" survey (sample size 3960 academic libraries) were…
Biomass equations for shrub species of Tamualipan thornscrub of North-Eastern Mexico
J. Navar; E. Mendez; A. Najera; J. Graciano; V. Dale; B. Parresol
2004-01-01
Nine additive allometric equations for computing above-ground, standing biomass were developed for the plant community and for each of 18 single species typical of the Tamaulipan thornscrub of north-eastern Mexico. Equations developed using additive procedures in seemingly unrelated linear regression provided statistical efficiency in total biomass estimates at the...
Development of statistical linear regression model for metals from transportation land uses.
Maniquiz, Marla C; Lee, Soyoung; Lee, Eunju; Kim, Lee-Hyung
2009-01-01
The transportation landuses possessing impervious surfaces such as highways, parking lots, roads, and bridges were recognized as the highly polluted non-point sources (NPSs) in the urban areas. Lots of pollutants from urban transportation are accumulating on the paved surfaces during dry periods and are washed-off during a storm. In Korea, the identification and monitoring of NPSs still represent a great challenge. Since 2004, the Ministry of Environment (MOE) has been engaged in several researches and monitoring to develop stormwater management policies and treatment systems for future implementation. The data over 131 storm events during May 2004 to September 2008 at eleven sites were analyzed to identify correlation relationships between particulates and metals, and to develop simple linear regression (SLR) model to estimate event mean concentration (EMC). Results indicate that there was no significant relationship between metals and TSS EMC. However, the SLR estimation models although not providing useful results are valuable indicators of high uncertainties that NPS pollution possess. Therefore, long term monitoring employing proper methods and precise statistical analysis of the data should be undertaken to eliminate these uncertainties.
Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah
2018-07-01
In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.
Linear regression techniques for use in the EC tracer method of secondary organic aerosol estimation
NASA Astrophysics Data System (ADS)
Saylor, Rick D.; Edgerton, Eric S.; Hartsell, Benjamin E.
A variety of linear regression techniques and simple slope estimators are evaluated for use in the elemental carbon (EC) tracer method of secondary organic carbon (OC) estimation. Linear regression techniques based on ordinary least squares are not suitable for situations where measurement uncertainties exist in both regressed variables. In the past, regression based on the method of Deming [1943. Statistical Adjustment of Data. Wiley, London] has been the preferred choice for EC tracer method parameter estimation. In agreement with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], we find that in the limited case where primary non-combustion OC (OC non-comb) is assumed to be zero, the ratio of averages (ROA) approach provides a stable and reliable estimate of the primary OC-EC ratio, (OC/EC) pri. In contrast with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], however, we find that the optimal use of Deming regression (and the more general York et al. [2004. Unified equations for the slope, intercept, and standard errors of the best straight line. American Journal of Physics 72, 367-375] regression) provides excellent results as well. For the more typical case where OC non-comb is allowed to obtain a non-zero value, we find that regression based on the method of York is the preferred choice for EC tracer method parameter estimation. In the York regression technique, detailed information on uncertainties in the measurement of OC and EC is used to improve the linear best fit to the given data. If only limited information is available on the relative uncertainties of OC and EC, then Deming regression should be used. On the other hand, use of ROA in the estimation of secondary OC, and thus the assumption of a zero OC non-comb value, generally leads to an overestimation of the contribution of secondary OC to total measured OC.
Assumptions of Statistical Tests: What Lies Beneath.
Jupiter, Daniel C
We have discussed many statistical tests and tools in this series of commentaries, and while we have mentioned the underlying assumptions of the tests, we have not explored them in detail. We stop to look at some of the assumptions of the t-test and linear regression, justify and explain them, mention what can go wrong when the assumptions are not met, and suggest some solutions in this case. Copyright © 2017 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Yoneoka, Daisuke; Henmi, Masayuki
2017-06-01
Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Estimation of stature from the foot and its segments in a sub-adult female population of North India
2011-01-01
Background Establishing personal identity is one of the main concerns in forensic investigations. Estimation of stature forms a basic domain of the investigation process in unknown and co-mingled human remains in forensic anthropology case work. The objective of the present study was to set up standards for estimation of stature from the foot and its segments in a sub-adult female population. Methods The sample for the study constituted 149 young females from the Northern part of India. The participants were aged between 13 and 18 years. Besides stature, seven anthropometric measurements that included length of the foot from each toe (T1, T2, T3, T4, and T5 respectively), foot breadth at ball (BBAL) and foot breadth at heel (BHEL) were measured on both feet in each participant using standard methods and techniques. Results The results indicated that statistically significant differences (p < 0.05) between left and right feet occur in both the foot breadth measurements (BBAL and BHEL). Foot length measurements (T1 to T5 lengths) did not show any statistically significant bilateral asymmetry. The correlation between stature and all the foot measurements was found to be positive and statistically significant (p-value < 0.001). Linear regression models and multiple regression models were derived for estimation of stature from the measurements of the foot. The present study indicates that anthropometric measurements of foot and its segments are valuable in the estimation of stature. Foot length measurements estimate stature with greater accuracy when compared to foot breadth measurements. Conclusions The present study concluded that foot measurements have a strong relationship with stature in the sub-adult female population of North India. Hence, the stature of an individual can be successfully estimated from the foot and its segments using different regression models derived in the study. The regression models derived in the study may be applied successfully for the estimation of stature in sub-adult females, whenever foot remains are brought for forensic examination. Stepwise multiple regression models tend to estimate stature more accurately than linear regression models in female sub-adults. PMID:22104433
Krishan, Kewal; Kanchan, Tanuj; Passi, Neelam
2011-11-21
Establishing personal identity is one of the main concerns in forensic investigations. Estimation of stature forms a basic domain of the investigation process in unknown and co-mingled human remains in forensic anthropology case work. The objective of the present study was to set up standards for estimation of stature from the foot and its segments in a sub-adult female population. The sample for the study constituted 149 young females from the Northern part of India. The participants were aged between 13 and 18 years. Besides stature, seven anthropometric measurements that included length of the foot from each toe (T1, T2, T3, T4, and T5 respectively), foot breadth at ball (BBAL) and foot breadth at heel (BHEL) were measured on both feet in each participant using standard methods and techniques. The results indicated that statistically significant differences (p < 0.05) between left and right feet occur in both the foot breadth measurements (BBAL and BHEL). Foot length measurements (T1 to T5 lengths) did not show any statistically significant bilateral asymmetry. The correlation between stature and all the foot measurements was found to be positive and statistically significant (p-value < 0.001). Linear regression models and multiple regression models were derived for estimation of stature from the measurements of the foot. The present study indicates that anthropometric measurements of foot and its segments are valuable in the estimation of stature. Foot length measurements estimate stature with greater accuracy when compared to foot breadth measurements. The present study concluded that foot measurements have a strong relationship with stature in the sub-adult female population of North India. Hence, the stature of an individual can be successfully estimated from the foot and its segments using different regression models derived in the study. The regression models derived in the study may be applied successfully for the estimation of stature in sub-adult females, whenever foot remains are brought for forensic examination. Stepwise multiple regression models tend to estimate stature more accurately than linear regression models in female sub-adults.
An operational definition of a statistically meaningful trend.
Bryhn, Andreas C; Dimberg, Peter H
2011-04-28
Linear trend analysis of time series is standard procedure in many scientific disciplines. If the number of data is large, a trend may be statistically significant even if data are scattered far from the trend line. This study introduces and tests a quality criterion for time trends referred to as statistical meaningfulness, which is a stricter quality criterion for trends than high statistical significance. The time series is divided into intervals and interval mean values are calculated. Thereafter, r(2) and p values are calculated from regressions concerning time and interval mean values. If r(2) ≥ 0.65 at p ≤ 0.05 in any of these regressions, then the trend is regarded as statistically meaningful. Out of ten investigated time series from different scientific disciplines, five displayed statistically meaningful trends. A Microsoft Excel application (add-in) was developed which can perform statistical meaningfulness tests and which may increase the operationality of the test. The presented method for distinguishing statistically meaningful trends should be reasonably uncomplicated for researchers with basic statistics skills and may thus be useful for determining which trends are worth analysing further, for instance with respect to causal factors. The method can also be used for determining which segments of a time trend may be particularly worthwhile to focus on.
Redmond, Tony; O'Leary, Neil; Hutchison, Donna M; Nicolela, Marcelo T; Artes, Paul H; Chauhan, Balwantray C
2013-12-01
A new analysis method called permutation of pointwise linear regression measures the significance of deterioration over time at each visual field location, combines the significance values into an overall statistic, and then determines the likelihood of change in the visual field. Because the outcome is a single P value, individualized to that specific visual field and independent of the scale of the original measurement, the method is well suited for comparing techniques with different stimuli and scales. To test the hypothesis that frequency-doubling matrix perimetry (FDT2) is more sensitive than standard automated perimetry (SAP) in identifying visual field progression in glaucoma. Patients with open-angle glaucoma and healthy controls were examined by FDT2 and SAP, both with the 24-2 test pattern, on the same day at 6-month intervals in a longitudinal prospective study conducted in a hospital-based setting. Only participants with at least 5 examinations were included. Data were analyzed with permutation of pointwise linear regression. Permutation of pointwise linear regression is individualized to each participant, in contrast to current analyses in which the statistical significance is inferred from population-based approaches. Analyses were performed with both total deviation and pattern deviation. Sixty-four patients and 36 controls were included in the study. The median age, SAP mean deviation, and follow-up period were 65 years, -2.6 dB, and 5.4 years, respectively, in patients and 62 years, +0.4 dB, and 5.2 years, respectively, in controls. Using total deviation analyses, statistically significant deterioration was identified in 17% of patients with FDT2, in 34% of patients with SAP, and in 14% of patients with both techniques; in controls these percentages were 8% with FDT2, 31% with SAP, and 8% with both. Using pattern deviation analyses, statistically significant deterioration was identified in 16% of patients with FDT2, in 17% of patients with SAP, and in 3% of patients with both techniques; in controls these values were 3% with FDT2 and none with SAP. No evidence was found that FDT2 is more sensitive than SAP in identifying visual field deterioration. In about one-third of healthy controls, age-related deterioration with SAP reached statistical significance.
Guo, Ying; Little, Roderick J; McConnell, Daniel S
2012-01-01
Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed. Information about measurement error is provided in an external calibration sample where data on X and W (but not Y and Z) are recorded. We describe a method that uses summary statistics from the calibration sample to create multiple imputations of the missing values of X in the regression sample, so that the regression coefficients of Y on X and Z and associated standard errors can be estimated using simple multiple imputation combining rules, yielding valid statistical inferences under the assumption of a multivariate normal distribution. The proposed method is shown by simulation to provide better inferences than existing methods, namely the naive method, classical calibration, and regression calibration, particularly for correction for bias and achieving nominal confidence levels. We also illustrate our method with an example using linear regression to examine the relation between serum reproductive hormone concentrations and bone mineral density loss in midlife women in the Michigan Bone Health and Metabolism Study. Existing methods fail to adjust appropriately for bias due to measurement error in the regression setting, particularly when measurement error is substantial. The proposed method corrects this deficiency.
Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case
NASA Astrophysics Data System (ADS)
Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann
2017-04-01
Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.
Luo, Ying-zhen; Tu, Meng; Fan, Fei; Zheng, Jie-qian; Yang, Ming; Li, Tao; Zhang, Kui; Deng, Zhen-hua
2015-06-01
To establish the linear regression equation between body height and combined length of manubrium and mesostenum of sternum measured by CT volume rendering technique (CT-VRT) in southwest Han population. One hundred and sixty subjects, including 80 males and 80 females were selected from southwest Han population for routine CT-VRT (reconstruction thickness 1 mm) examination. The lengths of both manubrium and mesosternum were recorded, and the combined length of manubrium and mesosternum was equal to the algebraic sum of them. The sex-specific linear regression equations between the combined length of manubrium and mesosternum and the real body height of each subject were deduced. The sex-specific simple linear regression equations between the combined length of manubrium and mesostenum (x3) and body height (y) were established (male: y = 135.000+2.118 x3 and female: y = 120.790+2.808 x3). Both equations showed statistical significance (P < 0.05) with a 100% predictive accuracy. CT-VRT is an effective method for measurement of the index of sternum. The combined length of manubrium and mesosternum from CT-VRT can be used for body height estimation in southwest Han population.
Modification of the USLE K factor for soil erodibility assessment on calcareous soils in Iran
NASA Astrophysics Data System (ADS)
Ostovari, Yaser; Ghorbani-Dashtaki, Shoja; Bahrami, Hossein-Ali; Naderi, Mehdi; Dematte, Jose Alexandre M.; Kerry, Ruth
2016-11-01
The measurement of soil erodibility (K) in the field is tedious, time-consuming and expensive; therefore, its prediction through pedotransfer functions (PTFs) could be far less costly and time-consuming. The aim of this study was to develop new PTFs to estimate the K factor using multiple linear regression, Mamdani fuzzy inference systems, and artificial neural networks. For this purpose, K was measured in 40 erosion plots with natural rainfall. Various soil properties including the soil particle size distribution, calcium carbonate equivalent, organic matter, permeability, and wet-aggregate stability were measured. The results showed that the mean measured K was 0.014 t h MJ- 1 mm- 1 and 2.08 times less than the estimated mean K (0.030 t h MJ- 1 mm- 1) using the USLE model. Permeability, wet-aggregate stability, very fine sand, and calcium carbonate were selected as independent variables by forward stepwise regression in order to assess the ability of multiple linear regression, Mamdani fuzzy inference systems and artificial neural networks to predict K. The calcium carbonate equivalent, which is not accounted for in the USLE model, had a significant impact on K in multiple linear regression due to its strong influence on the stability of aggregates and soil permeability. Statistical indices in validation and calibration datasets determined that the artificial neural networks method with the highest R2, lowest RMSE, and lowest ME was the best model for estimating the K factor. A strong correlation (R2 = 0.81, n = 40, p < 0.05) between the estimated K from multiple linear regression and measured K indicates that the use of calcium carbonate equivalent as a predictor variable gives a better estimation of K in areas with calcareous soils.
Suminski, Richard R; Robertson, Robert J; Goss, Fredric L; Olvera, Norma
2008-08-01
Whether the translation of verbal descriptors from English to Spanish affects the validity of the Children's OMNI Scale of Perceived Exertion is not known, so the validity of a Spanish version of the OMNI was examined with 32 boys and 36 girls (9 to 12 years old) for whom Spanish was the primary language. Oxygen consumption, ventilation, respiratory rate, respiratory exchange ratio, heart rate, and ratings of perceived exertion for the overall body (RPE-O) were measured during an incremental treadmill test. All response values displayed significant linear increases across test stages. The linear regression analyses indicated RPE-O values were distributed as positive linear functions of oxygen consumption, ventilation, respiratory rate, respiratory exchange ratio, heart rate, and percent of maximal oxygen consumption. All regression models were statistically significant. The Spanish OMNI Scale is valid for estimating exercise effort during walking and running amongst Hispanic youth whose primary language is Spanish.
Multivariate meta-analysis for non-linear and other multi-parameter associations
Gasparrini, A; Armstrong, B; Kenward, M G
2012-01-01
In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043
Instructional Advice, Time Advice and Learning Questions in Computer Simulations
ERIC Educational Resources Information Center
Rey, Gunter Daniel
2010-01-01
Undergraduate students (N = 97) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without instructional advice) x 2 (with or without time advice) x 2…
Modeling Outcomes with Floor or Ceiling Effects: An Introduction to the Tobit Model
ERIC Educational Resources Information Center
McBee, Matthew
2010-01-01
In gifted education research, it is common for outcome variables to exhibit strong floor or ceiling effects due to insufficient range of measurement of many instruments when used with gifted populations. Common statistical methods (e.g., analysis of variance, linear regression) produce biased estimates when such effects are present. In practice,…
Helping Students Assess the Relative Importance of Different Intermolecular Interactions
ERIC Educational Resources Information Center
Jasien, Paul G.
2008-01-01
A semi-quantitative model has been developed to estimate the relative effects of dispersion, dipole-dipole interactions, and H-bonding on the normal boiling points ("T[subscript b]") for a subset of simple organic systems. The model is based upon a statistical analysis using multiple linear regression on a series of straight-chain organic…
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beggs, W.J.
1981-02-01
This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; themore » analysis of variance; quality control procedures; and linear regression analysis.« less
Linear Least Squares for Correlated Data
NASA Technical Reports Server (NTRS)
Dean, Edwin B.
1988-01-01
Throughout the literature authors have consistently discussed the suspicion that regression results were less than satisfactory when the independent variables were correlated. Camm, Gulledge, and Womer, and Womer and Marcotte provide excellent applied examples of these concerns. Many authors have obtained partial solutions for this problem as discussed by Womer and Marcotte and Wonnacott and Wonnacott, which result in generalized least squares algorithms to solve restrictive cases. This paper presents a simple but relatively general multivariate method for obtaining linear least squares coefficients which are free of the statistical distortion created by correlated independent variables.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Ahearn, Elizabeth A.
2010-01-01
Multiple linear regression equations for determining flow-duration statistics were developed to estimate select flow exceedances ranging from 25- to 99-percent for six 'bioperiods'-Salmonid Spawning (November), Overwinter (December-February), Habitat Forming (March-April), Clupeid Spawning (May), Resident Spawning (June), and Rearing and Growth (July-October)-in Connecticut. Regression equations also were developed to estimate the 25- and 99-percent flow exceedances without reference to a bioperiod. In total, 32 equations were developed. The predictive equations were based on regression analyses relating flow statistics from streamgages to GIS-determined basin and climatic characteristics for the drainage areas of those streamgages. Thirty-nine streamgages (and an additional 6 short-term streamgages and 28 partial-record sites for the non-bioperiod 99-percent exceedance) in Connecticut and adjacent areas of neighboring States were used in the regression analysis. Weighted least squares regression analysis was used to determine the predictive equations; weights were assigned based on record length. The basin characteristics-drainage area, percentage of area with coarse-grained stratified deposits, percentage of area with wetlands, mean monthly precipitation (November), mean seasonal precipitation (December, January, and February), and mean basin elevation-are used as explanatory variables in the equations. Standard errors of estimate of the 32 equations ranged from 10.7 to 156 percent with medians of 19.2 and 55.4 percent to predict the 25- and 99-percent exceedances, respectively. Regression equations to estimate high and median flows (25- to 75-percent exceedances) are better predictors (smaller variability of the residual values around the regression line) than the equations to estimate low flows (less than 75-percent exceedance). The Habitat Forming (March-April) bioperiod had the smallest standard errors of estimate, ranging from 10.7 to 20.9 percent. In contrast, the Rearing and Growth (July-October) bioperiod had the largest standard errors, ranging from 30.9 to 156 percent. The adjusted coefficient of determination of the equations ranged from 77.5 to 99.4 percent with medians of 98.5 and 90.6 percent to predict the 25- and 99-percent exceedances, respectively. Descriptive information on the streamgages used in the regression, measured basin and climatic characteristics, and estimated flow-duration statistics are provided in this report. Flow-duration statistics and the 32 regression equations for estimating flow-duration statistics in Connecticut are stored on the U.S. Geological Survey World Wide Web application ?StreamStats? (http://water.usgs.gov/osw/streamstats/index.html). The regression equations developed in this report can be used to produce unbiased estimates of select flow exceedances statewide.
Bone mineral density and correlation factor analysis in normal Taiwanese children.
Shu, San-Ging
2007-01-01
Our aim was to establish reference data and linear regression equations for lumbar bone mineral density (BMD) in normal Taiwanese children. Several influencing factors of lumbar BMD were investigated. Two hundred fifty-seven healthy children were recruited from schools, 136 boys and 121 girls, aged 4-18 years were enrolled on a voluntary basis with written consent. Their height, weight, blood pressure, puberty stage, bone age and lumbar BMD (L2-4) by dual energy x-ray absorptiometry (DEXA) were measured. Data were analyzed using Pearson correlation and stepwise regression tests. All measurements increased with age. Prior to age 8, there was no gender difference. Parameters such as height, weight, and bone age (BA) in girls surpassed boys between ages 8-13 without statistical significance (p> or =0.05). This was reversed subsequently after age 14 in height (p<0.05). BMD difference had the same trend but was not statistically significant either. The influencing power of puberty stage and bone age over BMD was almost equal to or higher than that of height and weight. All the other factors correlated with BMD to variable powers. Multiple linear regression equations for boys and girls were formulated. BMD reference data is provided and can be used to monitor childhood pathological conditions. However, BMD in those with abnormal bone age or pubertal development could need modifications to ensure accuracy.
Reliable probabilities through statistical post-processing of ensemble predictions
NASA Astrophysics Data System (ADS)
Van Schaeybroeck, Bert; Vannitsem, Stéphane
2013-04-01
We develop post-processing or calibration approaches based on linear regression that make ensemble forecasts more reliable. We enforce climatological reliability in the sense that the total variability of the prediction is equal to the variability of the observations. Second, we impose ensemble reliability such that the spread around the ensemble mean of the observation coincides with the one of the ensemble members. In general the attractors of the model and reality are inhomogeneous. Therefore ensemble spread displays a variability not taken into account in standard post-processing methods. We overcome this by weighting the ensemble by a variable error. The approaches are tested in the context of the Lorenz 96 model (Lorenz 1996). The forecasts become more reliable at short lead times as reflected by a flatter rank histogram. Our best method turns out to be superior to well-established methods like EVMOS (Van Schaeybroeck and Vannitsem, 2011) and Nonhomogeneous Gaussian Regression (Gneiting et al., 2005). References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Lorenz, E. N., 1996: Predictability - a problem partly solved. Proceedings, Seminar on Predictability ECMWF. 1, 1-18. [3] Van Schaeybroeck, B., and S. Vannitsem, 2011: Post-processing through linear regression, Nonlin. Processes Geophys., 18, 147.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Song, Yong-Ze; Yang, Hong-Lei; Peng, Jun-Huan; Song, Yi-Rong; Sun, Qian; Li, Yuan
2015-01-01
Particulate matter with an aerodynamic diameter <2.5 μm (PM2.5) represents a severe environmental problem and is of negative impact on human health. Xi'an City, with a population of 6.5 million, is among the highest concentrations of PM2.5 in China. In 2013, in total, there were 191 days in Xi’an City on which PM2.5 concentrations were greater than 100 μg/m3. Recently, a few studies have explored the potential causes of high PM2.5 concentration using remote sensing data such as the MODIS aerosol optical thickness (AOT) product. Linear regression is a commonly used method to find statistical relationships among PM2.5 concentrations and other pollutants, including CO, NO2, SO2, and O3, which can be indicative of emission sources. The relationships of these variables, however, are usually complicated and non-linear. Therefore, a generalized additive model (GAM) is used to estimate the statistical relationships between potential variables and PM2.5 concentrations. This model contains linear functions of SO2 and CO, univariate smoothing non-linear functions of NO2, O3, AOT and temperature, and bivariate smoothing non-linear functions of location and wind variables. The model can explain 69.50% of PM2.5 concentrations, with R2 = 0.691, which improves the result of a stepwise linear regression (R2 = 0.582) by 18.73%. The two most significant variables, CO concentration and AOT, represent 20.65% and 19.54% of the deviance, respectively, while the three other gas-phase concentrations, SO2, NO2, and O3 account for 10.88% of the total deviance. These results show that in Xi'an City, the traffic and other industrial emissions are the primary source of PM2.5. Temperature, location, and wind variables also non-linearly related with PM2.5. PMID:26540446
A gentle introduction to quantile regression for ecologists
Cade, B.S.; Noon, B.R.
2003-01-01
Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.
Schaid, Daniel J
2010-01-01
Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.
An Analysis of San Diego's Housing Market Using a Geographically Weighted Regression Approach
NASA Astrophysics Data System (ADS)
Grant, Christina P.
San Diego County real estate transaction data was evaluated with a set of linear models calibrated by ordinary least squares and geographically weighted regression (GWR). The goal of the analysis was to determine whether the spatial effects assumed to be in the data are best studied globally with no spatial terms, globally with a fixed effects submarket variable, or locally with GWR. 18,050 single-family residential sales which closed in the six months between April 2014 and September 2014 were used in the analysis. Diagnostic statistics including AICc, R2, Global Moran's I, and visual inspection of diagnostic plots and maps indicate superior model performance by GWR as compared to both global regressions.
NASA Technical Reports Server (NTRS)
Whitlock, C. H., III
1977-01-01
Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.
Nonparametric estimation and testing of fixed effects panel data models
Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi
2009-01-01
In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335
NASA Astrophysics Data System (ADS)
Gomes, Dora Prata; Sequeira, Inês J.; Figueiredo, Carlos; Rueff, José; Brás, Aldina
2016-12-01
Human chromosomal fragile sites (CFSs) are heritable loci or regions of the human chromosomes prone to exhibit gaps, breaks and rearrangements. Determining the frequency of deletions and duplications in CFSs may contribute to explain the occurrence of human disease due to those rearrangements. In this study we analyzed the frequency of deletions and duplications in each human CFS. Statistical methods, namely data display, descriptive statistics and linear regression analysis were applied to analyze this dataset. We found that FRA15C, FRA16A and FRAXB are the most frequently involved CFSs in deletions and duplications occurring in the human genome.
Ghoreishi, Mohammad; Abdi-Shahshahani, Mehdi; Peyman, Alireza; Pourazizi, Mohsen
2018-02-21
The aim of this study was to determine the correlation between ocular biometric parameters and sulcus-to-sulcus (STS) diameter. This was a cross-sectional study of preoperative ocular biometry data of patients who were candidates for phakic intraocular lens (IOL) surgery. Subjects underwent ocular biometry analysis, including refraction error evaluation using an autorefractor and Orbscan topography for white-to-white (WTW) corneal diameter and measurement. Pentacam was used to perform WTW corneal diameter and measurements of minimum and maximum keratometry (K). Measurements of STS and angle-to-angle (ATA) were obtained using a 50-MHz B-mode ultrasound device. Anterior optical coherence tomography was performed for anterior chamber depth measurement. Pearson's correlation test and stepwise linear regression analysis were used to find a model to predict STS. Fifty-eight eyes of 58 patients were enrolled. Mean age ± standard deviation of sample was 28.95 ± 6.04 years. The Pearson's correlation coefficient between STS with WTW, ATA, mean K was 0.383, 0.492, and - 0.353, respectively, which was statistically significant (all P < 0.001). Using stepwise linear regression analysis, there is a statistically significant association between STS with WTW (P = 0.011) and mean K (P = 0.025). The standardized coefficient was 0.323 and - 0.284 for WTW and mean K, respectively. The stepwise linear regression analysis equation was: (STS = 9.549 + 0.518 WTW - 0.083 mean K). Based on our result, given the correlation of STS with WTW and mean K and potential of direct and essay measurement of WTW and mean K, it seems that current IOL sizing protocols could be estimating with WTW and mean K.
Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M
2016-05-01
Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE.Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model.The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0-30.3) of the articles and 18.5% (95% CI: 14.8-22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor.A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature.
Antecedents of students' achievement in statistics
NASA Astrophysics Data System (ADS)
Awaludin, Izyan Syazana; Razak, Ruzanna Ab; Harris, Hezlin; Selamat, Zarehan
2015-02-01
The applications of statistics in most fields have been vast. Many degree programmes at local universities require students to enroll in at least one statistics course. The standard of these courses varies across different degree programmes. This is because of students' diverse academic backgrounds in which some comes far from the field of statistics. The high failure rate in statistics courses for non-science stream students had been concerning every year. The purpose of this research is to investigate the antecedents of students' achievement in statistics. A total of 272 students participated in the survey. Multiple linear regression was applied to examine the relationship between the factors and achievement. We found that statistics anxiety was a significant predictor of students' achievement. We also found that students' age has significant effect to achievement. Older students are more likely to achieve lowers scores in statistics. Student's level of study also has a significant impact on their achievement in statistics.
Estimating labile particulate iron concentrations in coastal waters from remote sensing data
NASA Astrophysics Data System (ADS)
McGaraghan, Anna R.; Kudela, Raphael M.
2012-02-01
Owing to the difficulties inherent in measuring trace metals and the importance of iron as a limiting nutrient for biological systems, the ability to monitor particulate iron concentration remotely is desirable. This study examines the relationship between labile particulate iron, described here as weak acid leachable particulate iron or total dissolvable iron, and easily obtained bio-optical measurements. We develop a bio-optical proxy that can be used to estimate large-scale patterns of labile iron concentrations in surface waters, and we extend this by including other environmental variables in a multiple linear regression statistical model. By utilizing a ratio of optical backscatter and fluorescence obtained by satellite, we identify patterns in iron concentrations confirmed by traditional shipboard sampling. This basic relationship is improved with the addition of other environmental parameters in the statistical linear regression model. The optical proxy detects known temporal and spatial trends in average surface iron concentrations in Monterey Bay. The proxy is robust in that similar performance was obtained using two independent particulate iron data sets, but it exhibits weaker correlations than the full statistical model. This proxy will be a valuable tool for oceanographers seeking to monitor iron concentrations in coastal regions and allows for better understanding of the variability of labile particulate iron in surface waters to complement direct measurement of leachable particulate or total dissolvable iron.
Singer, Donald A.; Menzie, W.D.; Cheng, Qiuming; Bonham-Carter, G. F.
2005-01-01
Estimating numbers of undiscovered mineral deposits is a fundamental part of assessing mineral resources. Some statistical tools can act as guides to low variance, unbiased estimates of the number of deposits. The primary guide is that the estimates must be consistent with the grade and tonnage models. Another statistical guide is the deposit density (i.e., the number of deposits per unit area of permissive rock in well-explored control areas). Preliminary estimates and confidence limits of the number of undiscovered deposits in a tract of given area may be calculated using linear regression and refined using frequency distributions with appropriate parameters. A Poisson distribution leads to estimates having lower relative variances than the regression estimates and implies a random distribution of deposits. Coefficients of variation are used to compare uncertainties of negative binomial, Poisson, or MARK3 empirical distributions that have the same expected number of deposits as the deposit density. Statistical guides presented here allow simple yet robust estimation of the number of undiscovered deposits in permissive terranes.
Investigating Validity of Math 105 as Prerequisite to Math 201 among Undergraduate Students, Nigeria
ERIC Educational Resources Information Center
Zakariya, Yusuf F.
2016-01-01
In this study, the author examined the validity of MATH 105 as a prerequisite to MATH 201. The data for this study was extracted directly from the examination results logic of the university. Descriptive statistics in form of correlations and linear regressions were used to analyze the obtained data. Three research questions were formulated and…
ERIC Educational Resources Information Center
Al-Maamari, Faisal
2015-01-01
It is important to consider the question of whether teacher-, course-, and student-related factors affect student ratings of instructors in Student Evaluation of Teaching (SET) in English Language Teaching (ELT). This paper reports on a statistical analysis of SET in two large EFL programmes at a university setting in the Sultanate of Oman. I…
ERIC Educational Resources Information Center
Arnold, Carolyn L.; Kaufman, Phillip D.
This report examines the effects of both student and school characteristics on mathematics and science achievement levels in the third, seventh, and eleventh grades using data from the 1985-86 National Assessment of Educational Progress (NAEP). Analyses feature hierarchical linear models (HLM), a regression-like statistical technique that…
NASA Astrophysics Data System (ADS)
Sahoo, Sasmita; Jha, Madan K.
2013-12-01
The potential of multiple linear regression (MLR) and artificial neural network (ANN) techniques in predicting transient water levels over a groundwater basin were compared. MLR and ANN modeling was carried out at 17 sites in Japan, considering all significant inputs: rainfall, ambient temperature, river stage, 11 seasonal dummy variables, and influential lags of rainfall, ambient temperature, river stage and groundwater level. Seventeen site-specific ANN models were developed, using multi-layer feed-forward neural networks trained with Levenberg-Marquardt backpropagation algorithms. The performance of the models was evaluated using statistical and graphical indicators. Comparison of the goodness-of-fit statistics of the MLR models with those of the ANN models indicated that there is better agreement between the ANN-predicted groundwater levels and the observed groundwater levels at all the sites, compared to the MLR. This finding was supported by the graphical indicators and the residual analysis. Thus, it is concluded that the ANN technique is superior to the MLR technique in predicting spatio-temporal distribution of groundwater levels in a basin. However, considering the practical advantages of the MLR technique, it is recommended as an alternative and cost-effective groundwater modeling tool.
Lloyd-Jones, Luke R; Robinson, Matthew R; Yang, Jian; Visscher, Peter M
2018-04-01
Genome-wide association studies (GWAS) have identified thousands of loci that are robustly associated with complex diseases. The use of linear mixed model (LMM) methodology for GWAS is becoming more prevalent due to its ability to control for population structure and cryptic relatedness and to increase power. The odds ratio (OR) is a common measure of the association of a disease with an exposure ( e.g. , a genetic variant) and is readably available from logistic regression. However, when the LMM is applied to all-or-none traits it provides estimates of genetic effects on the observed 0-1 scale, a different scale to that in logistic regression. This limits the comparability of results across studies, for example in a meta-analysis, and makes the interpretation of the magnitude of an effect from an LMM GWAS difficult. In this study, we derived transformations from the genetic effects estimated under the LMM to the OR that only rely on summary statistics. To test the proposed transformations, we used real genotypes from two large, publicly available data sets to simulate all-or-none phenotypes for a set of scenarios that differ in underlying model, disease prevalence, and heritability. Furthermore, we applied these transformations to GWAS summary statistics for type 2 diabetes generated from 108,042 individuals in the UK Biobank. In both simulation and real-data application, we observed very high concordance between the transformed OR from the LMM and either the simulated truth or estimates from logistic regression. The transformations derived and validated in this study improve the comparability of results from prospective and already performed LMM GWAS on complex diseases by providing a reliable transformation to a common comparative scale for the genetic effects. Copyright © 2018 by the Genetics Society of America.
The discovery of indicator variables for QSAR using inductive logic programming
NASA Astrophysics Data System (ADS)
King, Ross D.; Srinivasan, Ashwin
1997-11-01
A central problem in forming accurate regression equations in QSAR studies isthe selection of appropriate descriptors for the compounds under study. Wedescribe a novel procedure for using inductive logic programming (ILP) todiscover new indicator variables (attributes) for QSAR problems, and show thatthese improve the accuracy of the derived regression equations. ILP techniqueshave previously been shown to work well on drug design problems where thereis a large structural component or where clear comprehensible rules arerequired. However, ILP techniques have had the disadvantage of only being ableto make qualitative predictions (e.g. active, inactive) and not to predictreal numbers (regression). We unify ILP and linear regression techniques togive a QSAR method that has the strength of ILP at describing stericstructure, with the familiarity and power of linear regression. We evaluatedthe utility of this new QSAR technique by examining the prediction ofbiological activity with and without the addition of new structural indicatorvariables formed by ILP. In three out of five datasets examined the additionof ILP variables produced statistically better results (P < 0.01) over theoriginal description. The new ILP variables did not increase the overallcomplexity of the derived QSAR equations and added insight into possiblemechanisms of action. We conclude that ILP can aid in the process of drugdesign.
Schlairet, Maura C; Schlairet, Timothy James; Sauls, Denise H; Bellflowers, Lois
2015-03-01
Establishing the impact of the high-fidelity simulation environment on student performance, as well as identifying factors that could predict learning, would refine simulation outcome expectations among educators. The purpose of this quasi-experimental pilot study was to explore the impact of simulation on emotion and cognitive load among beginning nursing students. Forty baccalaureate nursing students participated in teaching simulations, rated their emotional state and cognitive load, and completed evaluation simulations. Two principal components of emotion were identified representing the pleasant activation and pleasant deactivation components of affect. Mean rating of cognitive load following simulation was high. Linear regression identiffed slight but statistically nonsignificant positive associations between principal components of emotion and cognitive load. Logistic regression identified a negative but statistically nonsignificant effect of cognitive load on assessment performance. Among lower ability students, a more pronounced effect of cognitive load on assessment performance was observed; this also was statistically non-significant. Copyright 2015, SLACK Incorporated.
A Study of the Effect of the Front-End Styling of Sport Utility Vehicles on Pedestrian Head Injuries
Qin, Qin; Chen, Zheng; Bai, Zhonghao; Cao, Libo
2018-01-01
Background The number of sport utility vehicles (SUVs) on China market is continuously increasing. It is necessary to investigate the relationships between the front-end styling features of SUVs and head injuries at the styling design stage for improving the pedestrian protection performance and product development efficiency. Methods Styling feature parameters were extracted from the SUV side contour line. And simplified finite element models were established based on the 78 SUV side contour lines. Pedestrian headform impact simulations were performed and validated. The head injury criterion of 15 ms (HIC15) at four wrap-around distances was obtained. A multiple linear regression analysis method was employed to describe the relationships between the styling feature parameters and the HIC15 at each impact point. Results The relationship between the selected styling features and the HIC15 showed reasonable correlations, and the regression models and the selected independent variables showed statistical significance. Conclusions The regression equations obtained by multiple linear regression can be used to assess the performance of SUV styling in protecting pedestrians' heads and provide styling designers with technical guidance regarding their artistic creations.
Guillaume, Bryan; Wang, Changqing; Poh, Joann; Shen, Mo Jun; Ong, Mei Lyn; Tan, Pei Fang; Karnani, Neerja; Meaney, Michael; Qiu, Anqi
2018-06-01
Statistical inference on neuroimaging data is often conducted using a mass-univariate model, equivalent to fitting a linear model at every voxel with a known set of covariates. Due to the large number of linear models, it is challenging to check if the selection of covariates is appropriate and to modify this selection adequately. The use of standard diagnostics, such as residual plotting, is clearly not practical for neuroimaging data. However, the selection of covariates is crucial for linear regression to ensure valid statistical inference. In particular, the mean model of regression needs to be reasonably well specified. Unfortunately, this issue is often overlooked in the field of neuroimaging. This study aims to adopt the existing Confounder Adjusted Testing and Estimation (CATE) approach and to extend it for use with neuroimaging data. We propose a modification of CATE that can yield valid statistical inferences using Principal Component Analysis (PCA) estimators instead of Maximum Likelihood (ML) estimators. We then propose a non-parametric hypothesis testing procedure that can improve upon parametric testing. Monte Carlo simulations show that the modification of CATE allows for more accurate modelling of neuroimaging data and can in turn yield a better control of False Positive Rate (FPR) and Family-Wise Error Rate (FWER). We demonstrate its application to an Epigenome-Wide Association Study (EWAS) on neonatal brain imaging and umbilical cord DNA methylation data obtained as part of a longitudinal cohort study. Software for this CATE study is freely available at http://www.bioeng.nus.edu.sg/cfa/Imaging_Genetics2.html. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
Grantz, Erin; Haggard, Brian; Scott, J Thad
2018-06-12
We calculated four median datasets (chlorophyll a, Chl a; total phosphorus, TP; and transparency) using multiple approaches to handling censored observations, including substituting fractions of the quantification limit (QL; dataset 1 = 1QL, dataset 2 = 0.5QL) and statistical methods for censored datasets (datasets 3-4) for approximately 100 Texas, USA reservoirs. Trend analyses of differences between dataset 1 and 3 medians indicated percent difference increased linearly above thresholds in percent censored data (%Cen). This relationship was extrapolated to estimate medians for site-parameter combinations with %Cen > 80%, which were combined with dataset 3 as dataset 4. Changepoint analysis of Chl a- and transparency-TP relationships indicated threshold differences up to 50% between datasets. Recursive analysis identified secondary thresholds in dataset 4. Threshold differences show that information introduced via substitution or missing due to limitations of statistical methods biased values, underestimated error, and inflated the strength of TP thresholds identified in datasets 1-3. Analysis of covariance identified differences in linear regression models relating transparency-TP between datasets 1, 2, and the more statistically robust datasets 3-4. Study findings identify high-risk scenarios for biased analytical outcomes when using substitution. These include high probability of median overestimation when %Cen > 50-60% for a single QL, or when %Cen is as low 16% for multiple QL's. Changepoint analysis was uniquely vulnerable to substitution effects when using medians from sites with %Cen > 50%. Linear regression analysis was less sensitive to substitution and missing data effects, but differences in model parameters for transparency cannot be discounted and could be magnified by log-transformation of the variables.
Gender disparities in prosthodontics: authorship and leadership, 13 years of observation.
Kongkiatkamon, Suchada; Yuan, Judy Chia-Chun; Lee, Damian J; Knoernschild, Kent L; Campbell, Stephen D; Sukotjo, Cortino
2010-10-01
The purpose of this study was to examine gender disparities in prosthodontics by reviewing the trend of female authorship in prosthodontic journals and exploring the role of female leadership in prosthodontic organizations and Advanced Education in Prosthodontic (AEP) programs. Three journals representing the prosthodontic specialty were selected to analyze the percentage of female dentist first and last (senior) authors for the years 1995, 2000, 2005, and 2008. Article inclusion criteria were restricted to the first or last authors who held at least a DMD/DDS/BDS degree and were from U.S. institutions. Data on female leadership in prosthodontic organizations and advanced education programs were collected, and the trends were studied. Descriptive statistics were used to analyze the data. A linear regression analysis was performed to investigate the proportion of female authorship compared to male in the dental literature. A Fisher's Exact Test was performed to contrast differences of female first and last authorship in the selected journals between years 1995 and 2008. Overall, there was no statistically significant linear increase in the proportion of either first or last female authorship compared to male authorship over time. With respect to each journal, the linear regression analysis showed that the increase of first female authorship was statistically significant (p= 0.016) compared to male authorship only in the Journal of Prosthetic Dentistry. The percentage of female presidents of prosthodontic organizations has been very limited. A similar trend was also observed in AEP program director positions. Over the past 13 years, female dentists' participation in prosthodontics literature authorship has not increased significantly in the United States. Furthermore, female involvement in prosthodontics leadership has been limited over the past decades. © 2010 by The American College of Prosthodontists.
Asquith, William H.; Thompson, David B.
2008-01-01
The U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, investigated a refinement of the regional regression method and developed alternative equations for estimation of peak-streamflow frequency for undeveloped watersheds in Texas. A common model for estimation of peak-streamflow frequency is based on the regional regression method. The current (2008) regional regression equations for 11 regions of Texas are based on log10 transformations of all regression variables (drainage area, main-channel slope, and watershed shape). Exclusive use of log10-transformation does not fully linearize the relations between the variables. As a result, some systematic bias remains in the current equations. The bias results in overestimation of peak streamflow for both the smallest and largest watersheds. The bias increases with increasing recurrence interval. The primary source of the bias is the discernible curvilinear relation in log10 space between peak streamflow and drainage area. Bias is demonstrated by selected residual plots with superimposed LOWESS trend lines. To address the bias, a statistical framework based on minimization of the PRESS statistic through power transformation of drainage area is described and implemented, and the resulting regression equations are reported. Compared to log10-exclusive equations, the equations derived from PRESS minimization have PRESS statistics and residual standard errors less than the log10 exclusive equations. Selected residual plots for the PRESS-minimized equations are presented to demonstrate that systematic bias in regional regression equations for peak-streamflow frequency estimation in Texas can be reduced. Because the overall error is similar to the error associated with previous equations and because the bias is reduced, the PRESS-minimized equations reported here provide alternative equations for peak-streamflow frequency estimation.
Estimating linear effects in ANOVA designs: the easy way.
Pinhas, Michal; Tzelgov, Joseph; Ganor-Stern, Dana
2012-09-01
Research in cognitive science has documented numerous phenomena that are approximated by linear relationships. In the domain of numerical cognition, the use of linear regression for estimating linear effects (e.g., distance and SNARC effects) became common following Fias, Brysbaert, Geypens, and d'Ydewalle's (1996) study on the SNARC effect. While their work has become the model for analyzing linear effects in the field, it requires statistical analysis of individual participants and does not provide measures of the proportions of variability accounted for (cf. Lorch & Myers, 1990). In the present methodological note, using both the distance and SNARC effects as examples, we demonstrate how linear effects can be estimated in a simple way within the framework of repeated measures analysis of variance. This method allows for estimating effect sizes in terms of both slope and proportions of variability accounted for. Finally, we show that our method can easily be extended to estimate linear interaction effects, not just linear effects calculated as main effects.
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.
Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard
2017-04-01
To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.
NASA Technical Reports Server (NTRS)
Wolf, S. F.; Lipschutz, M. E.
1993-01-01
Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
NASA Technical Reports Server (NTRS)
Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.
2010-01-01
The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.
Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method.
Jiang, Yuan; He, Yunxiao; Zhang, Heping
LASSO is a popular statistical tool often used in conjunction with generalized linear models that can simultaneously select variables and estimate parameters. When there are many variables of interest, as in current biological and biomedical studies, the power of LASSO can be limited. Fortunately, so much biological and biomedical data have been collected and they may contain useful information about the importance of certain variables. This paper proposes an extension of LASSO, namely, prior LASSO (pLASSO), to incorporate that prior information into penalized generalized linear models. The goal is achieved by adding in the LASSO criterion function an additional measure of the discrepancy between the prior information and the model. For linear regression, the whole solution path of the pLASSO estimator can be found with a procedure similar to the Least Angle Regression (LARS). Asymptotic theories and simulation results show that pLASSO provides significant improvement over LASSO when the prior information is relatively accurate. When the prior information is less reliable, pLASSO shows great robustness to the misspecification. We illustrate the application of pLASSO using a real data set from a genome-wide association study.
Modeling the microstructurally dependent mechanical properties of poly(ester-urethane-urea)s.
Warren, P Daniel; Sycks, Dalton G; McGrath, Dominic V; Vande Geest, Jonathan P
2013-12-01
Poly(ester-urethane-urea) (PEUU) is one of many synthetic biodegradable elastomers under scrutiny for biomedical and soft tissue applications. The goal of this study was to investigate the effect of the experimental parameters on mechanical properties of PEUUs following exposure to different degrading environments, similar to that of the human body, using linear regression, producing one predictive model. The model utilizes two independent variables of poly(caprolactone) (PCL) type and copolymer crystallinity to predict the dependent variable of maximum tangential modulus (MTM). Results indicate that comparisons between PCLs at different degradation states are statistically different (p < 0.0003), while the difference between experimental and predicted average MTM is statistically negligible (p < 0.02). The linear correlation between experimental and predicted MTM values is R(2) = 0.75. Copyright © 2013 Wiley Periodicals, Inc., a Wiley Company.
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables. PMID:29713298
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables.
NASA Astrophysics Data System (ADS)
Singh, S.; Jaishi, H. P.; Tiwari, R. P.; Tiwari, R. C.
2017-07-01
This paper reports the analysis of soil radon data recorded in the seismic zone-V, located in the northeastern part of India (latitude 23.73N, longitude 92.73E). Continuous measurements of soil-gas emission along Chite fault in Mizoram (India) were carried out with the replacement of solid-state nuclear track detectors at weekly interval. The present study was done for the period from March 2013 to May 2015 using LR-115 Type II detectors, manufactured by Kodak Pathe, France. In order to reduce the influence of meteorological parameters, statistical analysis tools such as multiple linear regression and artificial neural network have been used. Decrease in radon concentration was recorded prior to some earthquakes that occurred during the observation period. Some false anomalies were also recorded which may be attributed to the ongoing crustal deformation which was not major enough to produce an earthquake.
Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood
Bondell, Howard D.; Stefanski, Leonard A.
2013-01-01
Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805
HOS network-based classification of power quality events via regression algorithms
NASA Astrophysics Data System (ADS)
Palomares Salas, José Carlos; González de la Rosa, Juan José; Sierra Fernández, José María; Pérez, Agustín Agüera
2015-12-01
This work compares seven regression algorithms implemented in artificial neural networks (ANNs) supported by 14 power-quality features, which are based in higher-order statistics. Combining time and frequency domain estimators to deal with non-stationary measurement sequences, the final goal of the system is the implementation in the future smart grid to guarantee compatibility between all equipment connected. The principal results are based in spectral kurtosis measurements, which easily adapt to the impulsive nature of the power quality events. These results verify that the proposed technique is capable of offering interesting results for power quality (PQ) disturbance classification. The best results are obtained using radial basis networks, generalized regression, and multilayer perceptron, mainly due to the non-linear nature of data.
Disconcordance in Statistical Models of Bisphenol A and Chronic Disease Outcomes in NHANES 2003-08
Casey, Martin F.; Neidell, Matthew
2013-01-01
Background Bisphenol A (BPA), a high production chemical commonly found in plastics, has drawn great attention from researchers due to the substance’s potential toxicity. Using data from three National Health and Nutrition Examination Survey (NHANES) cycles, we explored the consistency and robustness of BPA’s reported effects on coronary heart disease and diabetes. Methods And Findings We report the use of three different statistical models in the analysis of BPA: (1) logistic regression, (2) log-linear regression, and (3) dose-response logistic regression. In each variation, confounders were added in six blocks to account for demographics, urinary creatinine, source of BPA exposure, healthy behaviours, and phthalate exposure. Results were sensitive to the variations in functional form of our statistical models, but no single model yielded consistent results across NHANES cycles. Reported ORs were also found to be sensitive to inclusion/exclusion criteria. Further, observed effects, which were most pronounced in NHANES 2003-04, could not be explained away by confounding. Conclusions Limitations in the NHANES data and a poor understanding of the mode of action of BPA have made it difficult to develop informative statistical models. Given the sensitivity of effect estimates to functional form, researchers should report results using multiple specifications with different assumptions about BPA measurement, thus allowing for the identification of potential discrepancies in the data. PMID:24223205
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H
2018-01-01
Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P < 0.01), and correlated significantly with AF traits (P < 0.05). The best multiple linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.
Simultaneous multiple non-crossing quantile regression estimation using kernel constraints
Liu, Yufeng; Wu, Yichao
2011-01-01
Quantile regression (QR) is a very useful statistical tool for learning the relationship between the response variable and covariates. For many applications, one often needs to estimate multiple conditional quantile functions of the response variable given covariates. Although one can estimate multiple quantiles separately, it is of great interest to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain better estimation accuracy than individually estimated quantile functions. Another important advantage of joint estimation is the feasibility of incorporating simultaneous non-crossing constraints of QR functions. In this paper, we propose a new kernel-based multiple QR estimation technique, namely simultaneous non-crossing quantile regression (SNQR). We use kernel representations for QR functions and apply constraints on the kernel coefficients to avoid crossing. Both unregularised and regularised SNQR techniques are considered. Asymptotic properties such as asymptotic normality of linear SNQR and oracle properties of the sparse linear SNQR are developed. Our numerical results demonstrate the competitive performance of our SNQR over the original individual QR estimation. PMID:22190842
Monthly monsoon rainfall forecasting using artificial neural networks
NASA Astrophysics Data System (ADS)
Ganti, Ravikumar
2014-10-01
Indian agriculture sector heavily depends on monsoon rainfall for successful harvesting. In the past, prediction of rainfall was mainly performed using regression models, which provide reasonable accuracy in the modelling and forecasting of complex physical systems. Recently, Artificial Neural Networks (ANNs) have been proposed as efficient tools for modelling and forecasting. A feed-forward multi-layer perceptron type of ANN architecture trained using the popular back-propagation algorithm was employed in this study. Other techniques investigated for modeling monthly monsoon rainfall include linear and non-linear regression models for comparison purposes. The data employed in this study include monthly rainfall and monthly average of the daily maximum temperature in the North Central region in India. Specifically, four regression models and two ANN model's were developed. The performance of various models was evaluated using a wide variety of standard statistical parameters and scatter plots. The results obtained in this study for forecasting monsoon rainfalls using ANNs have been encouraging. India's economy and agricultural activities can be effectively managed with the help of the availability of the accurate monsoon rainfall forecasts.
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
NASA Technical Reports Server (NTRS)
Stolzer, Alan J.; Halford, Carl
2007-01-01
In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
Hu, Yin; Niu, Yong; Wang, Dandan; Wang, Ying; Holden, Brien A; He, Mingguang
2015-01-22
Structural changes of retinal vasculature, such as altered retinal vascular calibers, are considered as early signs of systemic vascular damage. We examined the associations of 5-year mean level, longitudinal trend, and fluctuation in fasting plasma glucose (FPG) with retinal vascular caliber in people without established diabetes. A prospective study was conducted in a cohort of Chinese people age ≥40 years in Guangzhou, southern China. The FPG was measured at baseline in 2008 and annually until 2012. In 2012, retinal vascular caliber was assessed using standard fundus photographs and validated software. A total of 3645 baseline nondiabetic participants with baseline and follow-up data on FPG for 3 or more visits was included for statistical analysis. The associations of retinal vascular caliber with 5-year mean FPG level, longitudinal FPG trend (slope of linear regression-FPG), and fluctuation (standard deviation and root mean square error of FPG) were analyzed using multivariable linear regression analyses. Multivariate regression models adjusted for baseline FPG and other potential confounders showed that a 10% annual increase in FPG was associated independently with a 2.65-μm narrowing in retinal arterioles (P = 0.008) and a 3.47-μm widening in venules (P = 0. 0.004). Associations with mean FPG level and fluctuation were not statistically significant. Annual rising trend in FPG, but not its mean level or fluctuation, is associated with altered retinal vasculature in nondiabetic people. Copyright 2015 The Association for Research in Vision and Ophthalmology, Inc.
Bramness, Jørgen G; Walby, Fredrik A; Morken, Gunnar; Røislien, Jo
2015-08-01
Seasonal variation in the number of suicides has long been acknowledged. It has been suggested that this seasonality has declined in recent years, but studies have generally used statistical methods incapable of confirming this. We examined all suicides occurring in Norway during 1969-2007 (more than 20,000 suicides in total) to establish whether seasonality decreased over time. Fitting of additive Fourier Poisson time-series regression models allowed for formal testing of a possible linear decrease in seasonality, or a reduction at a specific point in time, while adjusting for a possible smooth nonlinear long-term change without having to categorize time into discrete yearly units. The models were compared using Akaike's Information Criterion and analysis of variance. A model with a seasonal pattern was significantly superior to a model without one. There was a reduction in seasonality during the period. Both the model assuming a linear decrease in seasonality and the model assuming a change at a specific point in time were both superior to a model assuming constant seasonality, thus confirming by formal statistical testing that the magnitude of the seasonality in suicides has diminished. The additive Fourier Poisson time-series regression model would also be useful for studying other temporal phenomena with seasonal components. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
NASA Technical Reports Server (NTRS)
Flynn, Clare; Pickering, Kenneth E.; Crawford, James H.; Lamsol, Lok; Krotkov, Nickolay; Herman, Jay; Weinheimer, Andrew; Chen, Gao; Liu, Xiong; Szykman, James;
2014-01-01
To investigate the ability of column (or partial column) information to represent surface air quality, results of linear regression analyses between surface mixing ratio data and column abundances for O3 and NO2 are presented for the July 2011 Maryland deployment of the DISCOVER-AQ mission. Data collected by the P-3B aircraft, ground-based Pandora spectrometers, Aura/OMI satellite instrument, and simulations for July 2011 from the CMAQ air quality model during this deployment provide a large and varied data set, allowing this problem to be approached from multiple perspectives. O3 columns typically exhibited a statistically significant and high degree of correlation with surface data (R(sup 2) > 0.64) in the P- 3B data set, a moderate degree of correlation (0.16 < R(sup 2) < 0.64) in the CMAQ data set, and a low degree of correlation (R(sup 2) < 0.16) in the Pandora and OMI data sets. NO2 columns typically exhibited a low to moderate degree of correlation with surface data in each data set. The results of linear regression analyses for O3 exhibited smaller errors relative to the observations than NO2 regressions. These results suggest that O3 partial column observations from future satellite instruments with sufficient sensitivity to the lower troposphere can be meaningful for surface air quality analysis.
ERIC Educational Resources Information Center
Molenaar, Dylan; Dolan, Conor V.; de Boeck, Paul
2012-01-01
The Graded Response Model (GRM; Samejima, "Estimation of ability using a response pattern of graded scores," Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, [theta], to underlie the ordinal item scores (Takane & de Leeuw in…
ERIC Educational Resources Information Center
Large, Matthew; Nielssen, Olav; Lackersteen, Steven; Smith, Glen
2010-01-01
Previous studies have found that rates of homicide of children aged under one (infant homicide) are associated with rates of suicide, but not with rates of homicide. Linear regression was used to examine associations among infant homicide, homicide, and suicide in samples of regions in the United States and other countries. Infant homicide rates…
NASA Astrophysics Data System (ADS)
Kirchner-Bossi, Nicolas; Befort, Daniel J.; Wild, Simon B.; Ulbrich, Uwe; Leckebusch, Gregor C.
2016-04-01
Time-clustered winter storms are responsible for a majority of the wind-induced losses in Europe. Over last years, different atmospheric and oceanic large-scale mechanisms as the North Atlantic Oscillation (NAO) or the Meridional Overturning Circulation (MOC) have been proven to drive some significant portion of the windstorm variability over Europe. In this work we systematically investigate the influence of different large-scale natural variability modes: more than 20 indices related to those mechanisms with proven or potential influence on the windstorm frequency variability over Europe - mostly SST- or pressure-based - are derived by means of ECMWF ERA-20C reanalysis during the last century (1902-2009), and compared to the windstorm variability for the European winter (DJF). Windstorms are defined and tracked as in Leckebusch et al. (2008). The derived indices are then employed to develop a statistical procedure including a stepwise Multiple Linear Regression (MLR) and an Artificial Neural Network (ANN), aiming to hindcast the inter-annual (DJF) regional windstorm frequency variability in a case study for the British Isles. This case study reveals 13 indices with a statistically significant coupling with seasonal windstorm counts. The Scandinavian Pattern (SCA) showed the strongest correlation (0.61), followed by the NAO (0.48) and the Polar/Eurasia Pattern (0.46). The obtained indices (standard-normalised) are selected as predictors for a windstorm variability hindcast model applied for the British Isles. First, a stepwise linear regression is performed, to identify which mechanisms can explain windstorm variability best. Finally, the indices retained by the stepwise regression are used to develop a multlayer perceptron-based ANN that hindcasted seasonal windstorm frequency and clustering. Eight indices (SCA, NAO, EA, PDO, W.NAtl.SST, AMO (unsmoothed), EA/WR and Trop.N.Atl SST) are retained by the stepwise regression. Among them, SCA showed the highest linear coefficient, followed by SST in western Atlantic, AMO and NAO. The explanatory regression model (considering all time steps) provided a Coefficient of Determination (R^2) of 0.75. A predictive version of the linear model applying a leave-one-out cross-validation (LOOCV) shows an R2 of 0.56 and a relative RMSE of 4.67 counts/season. An ANN-based nonlinear hindcast model for the seasonal windstorm frequency is developed with the aim to improve the stepwise hindcast ability and thus better predict a time-clustered season over the case study. A 7 node-hidden layer perceptron is set, and the LOOCV procedure reveals a R2 of 0.71. In comparison to the stepwise MLR the RMSE is reduced a 20%. This work shows that for the British Isles case study, most of the interannual variability can be explained by certain large-scale mechanisms, considering also nonlinear effects (ANN). This allows to discern a time-clustered season from a non-clustered one - a key issue for applications e.g., in the (re)insurance industry.
NASA Astrophysics Data System (ADS)
Bonelli, Maria Grazia; Ferrini, Mauro; Manni, Andrea
2016-12-01
The assessment of metals and organic micropollutants contamination in agricultural soils is a difficult challenge due to the extensive area used to collect and analyze a very large number of samples. With Dioxins and dioxin-like PCBs measurement methods and subsequent the treatment of data, the European Community advises the develop low-cost and fast methods allowing routing analysis of a great number of samples, providing rapid measurement of these compounds in the environment, feeds and food. The aim of the present work has been to find a method suitable to describe the relations occurring between organic and inorganic contaminants and use the value of the latter in order to forecast the former. In practice, the use of a metal portable soil analyzer coupled with an efficient statistical procedure enables the required objective to be achieved. Compared to Multiple Linear Regression, the Artificial Neural Networks technique has shown to be an excellent forecasting method, though there is no linear correlation between the variables to be analyzed.
The Relationship between TOC and pH with Exchangeable Heavy Metal Levels in Lithuanian Podzols
NASA Astrophysics Data System (ADS)
Khaledian, Yones; Pereira, Paulo; Brevik, Eric C.; Pundyte, Neringa; Paliulis, Dainius
2017-04-01
Heavy metals can have a negative impact on public and environmental health. The objective of this study was to investigate the relationship between total organic carbon (TOC) and pH with exchangeable heavy metals (Pb, Cd, Cu and Zn) in order to predict exchangeable heavy metal content in soils sampled near Panevėžys and Kaunas, Lithuania. Principal component regression (PCR) and nonlinear regression methods were tested to find the statistical relationship between TOC and pH with heavy metals. The results of PCR [R2 = 0.68, RMSE = 0.07] and non-linear regression [R2 = 0.74, RMSE= 0.065] (pH with TOC and exchangeable parameters) were statistically significant. However, this was not observed in the relationships of pH and TOC separately with exchangeable heavy metals. The results indicated that pH had a higher correlation with exchangeable heavy metals (non-linear regression [R2 = 0.72, RMSE= 0.066]) than TOC with heavy metals [R2 = 0.30, RMSE= 0.004]. It can be concluded that even though there was a strong relationship between TOC and pH with exchangeable metals, the metal mobility (exchangeable metals) can be explained by pH better than TOC in this study. Finally, manipulating soil pH could likely be productive to assess and control heavy metals when financial and time limitations exist (Khaledian et al. 2016). Reference(s) Khaledian Y, Pereira P, Brevik E.C, Pundyte N, Paliulis D. 2016. The Influence of Organic Carbon and pH on Heavy Metals, Potassium, and Magnesium Levels in Lithuanian Podzols. Land Degradation and Development. DOI: 10.1002/ldr.2638
NASA Astrophysics Data System (ADS)
Alp, E.; Yücel, Ö.; Özcan, Z.
2014-12-01
Turkey has been making many legal arrangements for sustainable water management during the harmonization process with the European Union. In order to make cost effective and efficient decisions, monitoring network in Turkey has been expanding. However, due to time and budget constraints, desired number of monitoring campaigns can not be carried. Hence, in this study, independent parameters that can be measured easily and quickly are used to estimate water quality parameters in Lake Mogan and Eymir using linear regression. Nonpoint sources are one of the major pollutant components in Eymir and Mogan lakes. In this paper, a correlation between easily measurable parameters, DO, temperature, electrical conductivity, pH, precipitation and dependent variables, TN, TP, COD, Chl-a, TSS, Total Coliform is investigated. Simple regression analysis is performed for each season in Eymir and Mogan lakes by using SPSS Statistical program using the water quality data collected between 2006-2012. Regression analysis demonstrated significant linear relationship between measured and simulated concentrations for TN (R2=0.86), TP (R2=0.85), TSS (R2=0.91), Chl-a (R2=0.94), COD (R2=0.99), T. Coliform (R2=0.97) which are the best results in each season for Eymir and Mogan Lakes. The overall results of this study shows that by using easily measurable parameters even in ungauged situation the water quality of lakes can be predicted. Moreover, the outputs obtained from the regression equations can be used as an input for water quality models such as phosphorus budget model which is used to calculate the required reduction in the external phosphorus load to Lake Mogan to meet the water quality standards.
Ludbrook, John
2010-07-01
1. There are two reasons for wanting to compare measurers or methods of measurement. One is to calibrate one method or measurer against another; the other is to detect bias. Fixed bias is present when one method gives higher (or lower) values across the whole range of measurement. Proportional bias is present when one method gives values that diverge progressively from those of the other. 2. Linear regression analysis is a popular method for comparing methods of measurement, but the familiar ordinary least squares (OLS) method is rarely acceptable. The OLS method requires that the x values are fixed by the design of the study, whereas it is usual that both y and x values are free to vary and are subject to error. In this case, special regression techniques must be used. 3. Clinical chemists favour techniques such as major axis regression ('Deming's method'), the Passing-Bablok method or the bivariate least median squares method. Other disciplines, such as allometry, astronomy, biology, econometrics, fisheries research, genetics, geology, physics and sports science, have their own preferences. 4. Many Monte Carlo simulations have been performed to try to decide which technique is best, but the results are almost uninterpretable. 5. I suggest that pharmacologists and physiologists should use ordinary least products regression analysis (geometric mean regression, reduced major axis regression): it is versatile, can be used for calibration or to detect bias and can be executed by hand-held calculator or by using the loss function in popular, general-purpose, statistical software.
NASA Astrophysics Data System (ADS)
Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said
2014-09-01
In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong
2015-01-01
Background Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. Objectives This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. Methods We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. Results There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. Conclusion The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent. PMID:26053876
Wu, Yazhou; Zhou, Liang; Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong
2015-01-01
Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent.
Health tourism on the rise? Evidence from the Balance of Payments Statistics.
Loh, Chung-Ping A
2014-09-01
The study assesses the presence and magnitude of global trends in health tourism using health-related travel (HRT) spending reported in the International Monetary Fund's Balance of Payments Statistics database. Linear regression and quantile regression are applied to estimate secular trends of the import and export of HRT based on a sample of countries from 2003 to 2009. The results show that from 2003 to 2009 the import and export of health tourism rose among countries with a high volume of such activities (accounting for the upper 40% of the countries), but not among those with a low volume. The uneven growth in health tourism has generated greater contrast between countries with high and low volumes of health tourism activities. However, the growth in the total import of health tourism did not outpace the population growth, implying that in general the population's tendency to engage in health tourism remained static.
NASA Astrophysics Data System (ADS)
Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine
2018-01-01
Statistical downscaling models (SDMs) are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
Analysis of a Rocket Based Combined Cycle Engine during Rocket Only Operation
NASA Technical Reports Server (NTRS)
Smith, T. D.; Steffen, C. J., Jr.; Yungster, S.; Keller, D. J.
1998-01-01
The all rocket mode of operation is a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. However, outside of performing experiments or a full three dimensional analysis, there are no first order parametric models to estimate performance. As a result, an axisymmetric RBCC engine was used to analytically determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and statistical regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, percent of injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inject diameter ratio. A perfect gas computational fluid dynamics analysis was performed to obtain values of vacuum specific impulse. Statistical regression analysis was performed based on both full flow and gas generator engine cycles. Results were also found to be dependent upon the entire cycle assumptions. The statistical regression analysis determined that there were five significant linear effects, six interactions, and one second-order effect. Two parametric models were created to provide performance assessments of an RBCC engine in the all rocket mode of operation.
NASA Astrophysics Data System (ADS)
Nuccitelli, Dana; Cowtan, Kevin; Jacobs, Peter; Richardson, Mark; Way, Robert G.; Blackburn, Anne-Marie; Stolpe, Martin B.; Cook, John
2014-04-01
Lu (2013) (L13) argued that solar effects and anthropogenic halogenated gases can explain most of the observed warming of global mean surface air temperatures since 1850, with virtually no contribution from atmospheric carbon dioxide (CO2) concentrations. Here we show that this conclusion is based on assumptions about the saturation of the CO2-induced greenhouse effect that have been experimentally falsified. L13 also confuses equilibrium and transient response, and relies on data sources that have been superseeded due to known inaccuracies. Furthermore, the statistical approach of sequential linear regression artificially shifts variance onto the first predictor. L13's artificial choice of regression order and neglect of other relevant data is the fundamental cause of the incorrect main conclusion. Consideration of more modern data and a more parsimonious multiple regression model leads to contradiction with L13's statistical results. Finally, the correlation arguments in L13 are falsified by considering either the more appropriate metric of global heat accumulation, or data on longer timescales.
Predictors of effects of lifestyle intervention on diabetes mellitus type 2 patients.
Jacobsen, Ramune; Vadstrup, Eva; Røder, Michael; Frølich, Anne
2012-01-01
The main aim of the study was to identify predictors of the effects of lifestyle intervention on diabetes mellitus type 2 patients by means of multivariate analysis. Data from a previously published randomised clinical trial, which compared the effects of a rehabilitation programme including standardised education and physical training sessions in the municipality's health care centre with the same duration of individual counseling in the diabetes outpatient clinic, were used. Data from 143 diabetes patients were analysed. The merged lifestyle intervention resulted in statistically significant improvements in patients' systolic blood pressure, waist circumference, exercise capacity, glycaemic control, and some aspects of general health-related quality of life. The linear multivariate regression models explained 45% to 80% of the variance in these improvements. The baseline outcomes in accordance to the logic of the regression to the mean phenomenon were the only statistically significant and robust predictors in all regression models. These results are important from a clinical point of view as they highlight the more urgent need for and better outcomes following lifestyle intervention for those patients who have worse general and disease-specific health.
Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.
Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R
2018-06-01
Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.
Incremental online learning in high dimensions.
Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan
2005-12-01
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data
Ying, Gui-shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard
2017-01-01
Purpose To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. Methods We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field data in the elderly. Results When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI −0.03 to 0.32D, P=0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, P=0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller P-values, while analysis of the worse eye provided larger P-values than mixed effects models and marginal models. Conclusion In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision. PMID:28102741
Abu Bakar, S N; Aspalilah, A; AbdelNasser, I; Nurliza, A; Hairuliza, M J; Swarhib, M; Das, S; Mohd Nor, F
2017-01-01
Stature is one of the characteristics that could be used to identify human, besides age, sex and racial affiliation. This is useful when the body found is either dismembered, mutilated or even decomposed, and helps in narrowing down the missing person's identity. The main aim of the present study was to construct regression functions for stature estimation by using lower limb bones in the Malaysian population. The sample comprised 87 adult individuals (81 males, 6 females) aged between 20 to 79 years. The parameters such as thigh length, lower leg length, leg length, foot length, foot height and foot breadth were measured. They were measured by a ruler and measuring tape. Statistical analysis involved independent t-test to analyse the difference between lower limbs in male and female. The Pearson's correlation test was used to analyse correlations between lower limb parameters and stature, and the linear regressions were used to form equations. The paired t-test was used to compare between actual stature and estimated stature by using the equations formed. Using independent t-test, there was a significant difference (p< 0.05) in the measurement between males and females with regard to leg length, thigh length, lower leg length, foot length and foot breadth. The thigh length, leg length and foot length were observed to have strong correlations with stature with p= 0.75, p= 0.81 and p= 0.69, respectively. Linear regressions were formulated for stature estimation. Paired t-test showed no significant difference between actual stature and estimated stature. It is concluded that regression functions can be used to estimate stature to identify skeletal remains in the Malaysia population.
On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction.
Crop, F; Van Rompaye, B; Paelinck, L; Vakaet, L; Thierens, H; De Wagter, C
2008-07-21
The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry.
Hassan, A K
2015-01-01
In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature-conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility.
Catalog of Air Force Weather Technical Documents, 1941-2006
2006-05-19
radiosondes in current use in USA. Elementary discussion of statistical terms and concepts used for expressing accuracy or error is discussed. AWS TR 105...Techniques, Appendix B: Vorticity—An Elementary Discussion of the Concept, August 1956, 27pp. Formerly AWSM 105– 50/1A. Provides the necessary back...steps involved in ordinary multiple linear regression. Conditional probability is calculated using transnormalized variables in the multivariate normal
Prediction system of hydroponic plant growth and development using algorithm Fuzzy Mamdani method
NASA Astrophysics Data System (ADS)
Sudana, I. Made; Purnawirawan, Okta; Arief, Ulfa Mediaty
2017-03-01
Hydroponics is a method of farming without soil. One of the Hydroponic plants is Watercress (Nasturtium Officinale). The development and growth process of hydroponic Watercress was influenced by levels of nutrients, acidity and temperature. The independent variables can be used as input variable system to predict the value level of plants growth and development. The prediction system is using Fuzzy Algorithm Mamdani method. This system was built to implement the function of Fuzzy Inference System (Fuzzy Inference System/FIS) as a part of the Fuzzy Logic Toolbox (FLT) by using MATLAB R2007b. FIS is a computing system that works on the principle of fuzzy reasoning which is similar to humans' reasoning. Basically FIS consists of four units which are fuzzification unit, fuzzy logic reasoning unit, base knowledge unit and defuzzification unit. In addition to know the effect of independent variables on the plants growth and development that can be visualized with the function diagram of FIS output surface that is shaped three-dimensional, and statistical tests based on the data from the prediction system using multiple linear regression method, which includes multiple linear regression analysis, T test, F test, the coefficient of determination and donations predictor that are calculated using SPSS (Statistical Product and Service Solutions) software applications.
Improving medium-range ensemble streamflow forecasts through statistical post-processing
NASA Astrophysics Data System (ADS)
Mendoza, Pablo; Wood, Andy; Clark, Elizabeth; Nijssen, Bart; Clark, Martyn; Ramos, Maria-Helena; Nowak, Kenneth; Arnold, Jeffrey
2017-04-01
Probabilistic hydrologic forecasts are a powerful source of information for decision-making in water resources operations. A common approach is the hydrologic model-based generation of streamflow forecast ensembles, which can be implemented to account for different sources of uncertainties - e.g., from initial hydrologic conditions (IHCs), weather forecasts, and hydrologic model structure and parameters. In practice, hydrologic ensemble forecasts typically have biases and spread errors stemming from errors in the aforementioned elements, resulting in a degradation of probabilistic properties. In this work, we compare several statistical post-processing techniques applied to medium-range ensemble streamflow forecasts obtained with the System for Hydromet Applications, Research and Prediction (SHARP). SHARP is a fully automated prediction system for the assessment and demonstration of short-term to seasonal streamflow forecasting applications, developed by the National Center for Atmospheric Research, University of Washington, U.S. Army Corps of Engineers, and U.S. Bureau of Reclamation. The suite of post-processing techniques includes linear blending, quantile mapping, extended logistic regression, quantile regression, ensemble analogs, and the generalized linear model post-processor (GLMPP). We assess and compare these techniques using multi-year hindcasts in several river basins in the western US. This presentation discusses preliminary findings about the effectiveness of the techniques for improving probabilistic skill, reliability, discrimination, sharpness and resolution.
Patterns of shading tolerance determined from experimental ...
An extensive review of the experimental literature on seagrass shading evaluated the relationship between experimental light reductions, duration of experiment and seagrass response metrics to determine whether there were consistent statistical patterns. There were highly significant linear relationships of both percent biomass and percent shoot density reduction versus percent light reduction (versus controls), although unexplained variation in the data were high. Duration of exposure affected extent of response for both metrics, but was more clearly a factor in biomass response. Both biomass and shoot density showed linear responses to duration of light reduction for treatments 60%. Unexplained variation was again high, and greater for shoot density than biomass. With few exceptions, regressions of both biomass and shoot density on light reduction for individual species and for genera were statistically significant, but also tended to show high degrees of variability in data. Multivariate regressions that included both percent light reduction and duration of reduction as dependent variables increased the percentage of variation explained in almost every case. Analysis of response data by seagrass life history category (Colonizing, Opportunistic, Persistent) did not yield clearly separate response relationships in most cases. Biomass tended to show somewhat less variation in response to light reduction than shoot density, and of the two, may be the prefe
Almalik, Osama; Nijhuis, Michiel B; van den Heuvel, Edwin R
2014-01-01
Shelf-life estimation usually requires that at least three registration batches are tested for stability at multiple storage conditions. The shelf-life estimates are often obtained by linear regression analysis per storage condition, an approach implicitly suggested by ICH guideline Q1E. A linear regression analysis combining all data from multiple storage conditions was recently proposed in the literature when variances are homogeneous across storage conditions. The combined analysis is expected to perform better than the separate analysis per storage condition, since pooling data would lead to an improved estimate of the variation and higher numbers of degrees of freedom, but this is not evident for shelf-life estimation. Indeed, the two approaches treat the observed initial batch results, the intercepts in the model, and poolability of batches differently, which may eliminate or reduce the expected advantage of the combined approach with respect to the separate approach. Therefore, a simulation study was performed to compare the distribution of simulated shelf-life estimates on several characteristics between the two approaches and to quantify the difference in shelf-life estimates. In general, the combined statistical analysis does estimate the true shelf life more consistently and precisely than the analysis per storage condition, but it did not outperform the separate analysis in all circumstances.
Changes in aerobic power of men, ages 25-70 yr
NASA Technical Reports Server (NTRS)
Jackson, A. S.; Beard, E. F.; Wier, L. T.; Ross, R. M.; Stuteville, J. E.; Blair, S. N.
1995-01-01
This study quantified and compared the cross-sectional and longitudinal influence of age, self-report physical activity (SR-PA), and body composition (%fat) on the decline of maximal aerobic power (VO2peak). The cross-sectional sample consisted of 1,499 healthy men ages 25-70 yr. The 156 men of the longitudinal sample were from the same population and examined twice, the mean time between tests was 4.1 (+/- 1.2) yr. Peak oxygen uptake was determined by indirect calorimetry during a maximal treadmill exercise test. The zero-order correlations between VO2peak and %fat (r = -0.62) and SR-PA (r = 0.58) were significantly (P < 0.05) higher that the age correlation (r = -0.45). Linear regression defined the cross-sectional age-related decline in VO2peak at 0.46 ml.kg-1.min-1.yr-1. Multiple regression analysis (R = 0.79) showed that nearly 50% of this cross-sectional decline was due to %fat and SR-PA, adding these lifestyle variables to the multiple regression model reduced the age regression weight to -0.26 ml.kg-1.min-1.yr-1. Statistically controlling for time differences between tests, general linear models analysis showed that longitudinal changes in aerobic power were due to independent changes in %fat and SR-PA, confirming the cross-sectional results.
Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.
Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana
2012-02-01
In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P < 0.05 and P < 0.01) between the parameters of wheat dough studied by the Mixolab and its rheological properties measured with the Alveograph. A number of six predictive linear equations were obtained. Linear regression models gave multiple regression coefficients with R²(adjusted) > 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.
Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.
Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen
2015-05-01
Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.
Correcting for population structure and kinship using the linear mixed model: theory and extensions.
Hoffman, Gabriel E
2013-01-01
Population structure and kinship are widespread confounding factors in genome-wide association studies (GWAS). It has been standard practice to include principal components of the genotypes in a regression model in order to account for population structure. More recently, the linear mixed model (LMM) has emerged as a powerful method for simultaneously accounting for population structure and kinship. The statistical theory underlying the differences in empirical performance between modeling principal components as fixed versus random effects has not been thoroughly examined. We undertake an analysis to formalize the relationship between these widely used methods and elucidate the statistical properties of each. Moreover, we introduce a new statistic, effective degrees of freedom, that serves as a metric of model complexity and a novel low rank linear mixed model (LRLMM) to learn the dimensionality of the correction for population structure and kinship, and we assess its performance through simulations. A comparison of the results of LRLMM and a standard LMM analysis applied to GWAS data from the Multi-Ethnic Study of Atherosclerosis (MESA) illustrates how our theoretical results translate into empirical properties of the mixed model. Finally, the analysis demonstrates the ability of the LRLMM to substantially boost the strength of an association for HDL cholesterol in Europeans.
Huber, Stefan; Klein, Elise; Moeller, Korbinian; Willmes, Klaus
2015-10-01
In neuropsychological research, single-cases are often compared with a small control sample. Crawford and colleagues developed inferential methods (i.e., the modified t-test) for such a research design. In the present article, we suggest an extension of the methods of Crawford and colleagues employing linear mixed models (LMM). We first show that a t-test for the significance of a dummy coded predictor variable in a linear regression is equivalent to the modified t-test of Crawford and colleagues. As an extension to this idea, we then generalized the modified t-test to repeated measures data by using LMMs to compare the performance difference in two conditions observed in a single participant to that of a small control group. The performance of LMMs regarding Type I error rates and statistical power were tested based on Monte-Carlo simulations. We found that starting with about 15-20 participants in the control sample Type I error rates were close to the nominal Type I error rate using the Satterthwaite approximation for the degrees of freedom. Moreover, statistical power was acceptable. Therefore, we conclude that LMMs can be applied successfully to statistically evaluate performance differences between a single-case and a control sample. Copyright © 2015 Elsevier Ltd. All rights reserved.
Mixed models, linear dependency, and identification in age-period-cohort models.
O'Brien, Robert M
2017-07-20
This paper examines the identification problem in age-period-cohort models that use either linear or categorically coded ages, periods, and cohorts or combinations of these parameterizations. These models are not identified using the traditional fixed effect regression model approach because of a linear dependency between the ages, periods, and cohorts. However, these models can be identified if the researcher introduces a single just identifying constraint on the model coefficients. The problem with such constraints is that the results can differ substantially depending on the constraint chosen. Somewhat surprisingly, age-period-cohort models that specify one or more of ages and/or periods and/or cohorts as random effects are identified. This is the case without introducing an additional constraint. I label this identification as statistical model identification and show how statistical model identification comes about in mixed models and why which effects are treated as fixed and which are treated as random can substantially change the estimates of the age, period, and cohort effects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Criteria for the use of regression analysis for remote sensing of sediment and pollutants
NASA Technical Reports Server (NTRS)
Whitlock, C. H.; Kuo, C. Y.; Lecroy, S. R.
1982-01-01
An examination of limitations, requirements, and precision of the linear multiple-regression technique for quantification of marine environmental parameters is conducted. Both environmental and optical physics conditions have been defined for which an exact solution to the signal response equations is of the same form as the multiple regression equation. Various statistical parameters are examined to define a criteria for selection of an unbiased fit when upwelled radiance values contain error and are correlated with each other. Field experimental data are examined to define data smoothing requirements in order to satisfy the criteria of Daniel and Wood (1971). Recommendations are made concerning improved selection of ground-truth locations to maximize variance and to minimize physical errors associated with the remote sensing experiment.
Borges, Germana Jayme; Ruiz, Luis Fernando Naldi; de Alencar, Ana Helena Gonçalves; Porto, Olavo César Lyra; Estrela, Carlos
2015-01-01
The objective of the present study was to assess cone-beam computed tomography (CBCT) as a diagnostic method for determination of gingival thickness (GT) and distance between gingival margin and vestibular (GMBC-V) and interproximal bone crests (GMBC-I). GT and GMBC-V were measured in 348 teeth and GMBC-I was measured in 377 tooth regions of 29 patients with gummy smile. GT was assessed using transgingival probing (TP), ultrasound (US), and CBCT, whereas GMBC-V and GMBC-I were assessed by transsurgical clinical evaluation (TCE) and CBCT. Statistical analyses used independent t-test, Pearson's correlation coefficient, and simple linear regression. Difference was observed for GT: between TP, CBCT, and US considering all teeth; between TP and CBCT and between TP and US in incisors and canines; between TP and US in premolars and first molars. TP presented the highest means for GT. Positive correlation and linear regression were observed between TP and CBCT, TP and US, and CBCT and US. Difference was observed for GMBC-V and GMBC-I using TCE and CBCT, considering all teeth. Correlation and linear regression results were significant for GMBC-V and GMBC-I in incisors, canines, and premolars. CBCT is an effective diagnostic method to visualize and measure GT, GMBC-V, and GMBC-I. PMID:25918737
Geographical variation of cerebrovascular disease in New York State: the correlation with income
Han, Daikwon; Carrow, Shannon S; Rogerson, Peter A; Munschauer, Frederick E
2005-01-01
Background Income is known to be associated with cerebrovascular disease; however, little is known about the more detailed relationship between cerebrovascular disease and income. We examined the hypothesis that the geographical distribution of cerebrovascular disease in New York State may be predicted by a nonlinear model using income as a surrogate socioeconomic risk factor. Results We used spatial clustering methods to identify areas with high and low prevalence of cerebrovascular disease at the ZIP code level after smoothing rates and correcting for edge effects; geographic locations of high and low clusters of cerebrovascular disease in New York State were identified with and without income adjustment. To examine effects of income, we calculated the excess number of cases using a non-linear regression with cerebrovascular disease rates taken as the dependent variable and income and income squared taken as independent variables. The resulting regression equation was: excess rate = 32.075 - 1.22*10-4(income) + 8.068*10-10(income2), and both income and income squared variables were significant at the 0.01 level. When income was included as a covariate in the non-linear regression, the number and size of clusters of high cerebrovascular disease prevalence decreased. Some 87 ZIP codes exceeded the critical value of the local statistic yielding a relative risk of 1.2. The majority of low cerebrovascular disease prevalence geographic clusters disappeared when the non-linear income effect was included. For linear regression, the excess rate of cerebrovascular disease falls with income; each $10,000 increase in median income of each ZIP code resulted in an average reduction of 3.83 observed cases. The significant nonlinear effect indicates a lessening of this income effect with increasing income. Conclusion Income is a non-linear predictor of excess cerebrovascular disease rates, with both low and high observed cerebrovascular disease rate areas associated with higher income. Income alone explains a significant amount of the geographical variance in cerebrovascular disease across New York State since both high and low clusters of cerebrovascular disease dissipate or disappear with income adjustment. Geographical modeling, including non-linear effects of income, may allow for better identification of other non-traditional risk factors. PMID:16242043
Calculating stage duration statistics in multistage diseases.
Komarova, Natalia L; Thalhauser, Craig J
2011-01-01
Many human diseases are characterized by multiple stages of progression. While the typical sequence of disease progression can be identified, there may be large individual variations among patients. Identifying mean stage durations and their variations is critical for statistical hypothesis testing needed to determine if treatment is having a significant effect on the progression, or if a new therapy is showing a delay of progression through a multistage disease. In this paper we focus on two methods for extracting stage duration statistics from longitudinal datasets: an extension of the linear regression technique, and a counting algorithm. Both are non-iterative, non-parametric and computationally cheap methods, which makes them invaluable tools for studying the epidemiology of diseases, with a goal of identifying different patterns of progression by using bioinformatics methodologies. Here we show that the regression method performs well for calculating the mean stage durations under a wide variety of assumptions, however, its generalization to variance calculations fails under realistic assumptions about the data collection procedure. On the other hand, the counting method yields reliable estimations for both means and variances of stage durations. Applications to Alzheimer disease progression are discussed.
Gurnani, Ashita S; John, Samantha E; Gavett, Brandon E
2015-05-01
The current study developed regression-based normative adjustments for a bi-factor model of the The Brief Test of Adult Cognition by Telephone (BTACT). Archival data from the Midlife Development in the United States-II Cognitive Project were used to develop eight separate linear regression models that predicted bi-factor BTACT scores, accounting for age, education, gender, and occupation-alone and in various combinations. All regression models provided statistically significant fit to the data. A three-predictor regression model fit best and accounted for 32.8% of the variance in the global bi-factor BTACT score. The fit of the regression models was not improved by gender. Eight different regression models are presented to allow the user flexibility in applying demographic corrections to the bi-factor BTACT scores. Occupation corrections, while not widely used, may provide useful demographic adjustments for adult populations or for those individuals who have attained an occupational status not commensurate with expected educational attainment. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe
2018-01-01
In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.
Proton radius from electron scattering data
NASA Astrophysics Data System (ADS)
Higinbotham, Douglas W.; Kabir, Al Amin; Lin, Vincent; Meekins, David; Norum, Blaine; Sawatzky, Brad
2016-05-01
Background: The proton charge radius extracted from recent muonic hydrogen Lamb shift measurements is significantly smaller than that extracted from atomic hydrogen and electron scattering measurements. The discrepancy has become known as the proton radius puzzle. Purpose: In an attempt to understand the discrepancy, we review high-precision electron scattering results from Mainz, Jefferson Lab, Saskatoon, and Stanford. Methods: We make use of stepwise regression techniques using the F test as well as the Akaike information criterion to systematically determine the predictive variables to use for a given set and range of electron scattering data as well as to provide multivariate error estimates. Results: Starting with the precision, low four-momentum transfer (Q2) data from Mainz (1980) and Saskatoon (1974), we find that a stepwise regression of the Maclaurin series using the F test as well as the Akaike information criterion justify using a linear extrapolation which yields a value for the proton radius that is consistent with the result obtained from muonic hydrogen measurements. Applying the same Maclaurin series and statistical criteria to the 2014 Rosenbluth results on GE from Mainz, we again find that the stepwise regression tends to favor a radius consistent with the muonic hydrogen radius but produces results that are extremely sensitive to the range of data included in the fit. Making use of the high-Q2 data on GE to select functions which extrapolate to high Q2, we find that a Padé (N =M =1 ) statistical model works remarkably well, as does a dipole function with a 0.84 fm radius, GE(Q2) =(1+Q2/0.66 GeV2) -2 . Conclusions: Rigorous applications of stepwise regression techniques and multivariate error estimates result in the extraction of a proton charge radius that is consistent with the muonic hydrogen result of 0.84 fm; either from linear extrapolation of the extremely-low-Q2 data or by use of the Padé approximant for extrapolation using a larger range of data. Thus, based on a purely statistical analysis of electron scattering data, we conclude that the electron scattering results and the muonic hydrogen results are consistent. It is the atomic hydrogen results that are the outliers.
Correlation of Vitamin D status and orthodontic-induced external apical root resorption.
Tehranchi, Azita; Sadighnia, Azin; Younessian, Farnaz; Abdi, Amir H; Shirvani, Armin
2017-01-01
Adequate Vitamin D is essential for dental and skeletal health in children and adult. The purpose of this study was to assess the correlation of serum Vitamin D level with external-induced apical root resorption (EARR) following fixed orthodontic treatment. In this cross-sectional study, the prevalence of Vitamin D deficiency (defined by25-hydroxyvitamin-D) was determined in 34 patients (23.5% male; age range 12-23 years; mean age 16.63 ± 2.84) treated with fixed orthodontic treatment. Root resorption of four maxillary incisors was measured using before and after periapical radiographs (136 measured teeth) by means of a design-to-purpose software to optimize data collection. Teeth with a maximum percentage of root resorption (%EARR) were indicated as representative root resorption for each patient. A multiple linear regression model and Pearson correlation coefficient were used to assess the association of Vitamin D status and observed EARR. P < 0.05 was considered statistically significant. The Pearson coefficient between these two variables was determined about 0.15 ( P = 0.38). Regression analysis revealed that Vitamin D status of the patients demonstrated no significant statistical correlation with EARR, after adjustment of confounding variables using linear regression model ( P > 0.05). This study suggests that Vitamin D level is not among the clinical variables that are potential contributors for EARR. The prevalence of Vitamin D deficiency does not differ in patients with higher EARR. These data suggest the possibility that Vitamin D insufficiency may not contribute to the development of more apical root resorption although this remains to be confirmed by further longitudinal cohort studies.
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun
2016-05-01
DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). © 2016 WILEY PERIODICALS, INC.
Estimating Required Contingency Funds for Construction Projects using Multiple Linear Regression
2006-03-01
Breusch - Pagan test , in which the null hypothesis states that the residuals have constant variance. The alternate hypothesis is that the residuals do not...variance, the Breusch - Pagan test provides statistical evidence that the assumption is justified. For the proposed model, the p-value is 0.173...entire test sample. v Acknowledgments First, I would like to acknowledge the influence and help of Greg Hoffman. His work served as the
The Bayesian group lasso for confounded spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.
2017-01-01
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
Are your covariates under control? How normalization can re-introduce covariate effects.
Pain, Oliver; Dudbridge, Frank; Ronald, Angelica
2018-04-30
Many statistical tests rely on the assumption that the residuals of a model are normally distributed. Rank-based inverse normal transformation (INT) of the dependent variable is one of the most popular approaches to satisfy the normality assumption. When covariates are included in the analysis, a common approach is to first adjust for the covariates and then normalize the residuals. This study investigated the effect of regressing covariates against the dependent variable and then applying rank-based INT to the residuals. The correlation between the dependent variable and covariates at each stage of processing was assessed. An alternative approach was tested in which rank-based INT was applied to the dependent variable before regressing covariates. Analyses based on both simulated and real data examples demonstrated that applying rank-based INT to the dependent variable residuals after regressing out covariates re-introduces a linear correlation between the dependent variable and covariates, increasing type-I errors and reducing power. On the other hand, when rank-based INT was applied prior to controlling for covariate effects, residuals were normally distributed and linearly uncorrelated with covariates. This latter approach is therefore recommended in situations were normality of the dependent variable is required.
Forecasting daily patient volumes in the emergency department.
Jones, Spencer S; Thomas, Alun; Evans, R Scott; Welch, Shari J; Haug, Peter J; Snow, Gregory L
2008-02-01
Shifts in the supply of and demand for emergency department (ED) resources make the efficient allocation of ED resources increasingly important. Forecasting is a vital activity that guides decision-making in many areas of economic, industrial, and scientific planning, but has gained little traction in the health care industry. There are few studies that explore the use of forecasting methods to predict patient volumes in the ED. The goals of this study are to explore and evaluate the use of several statistical forecasting methods to predict daily ED patient volumes at three diverse hospital EDs and to compare the accuracy of these methods to the accuracy of a previously proposed forecasting method. Daily patient arrivals at three hospital EDs were collected for the period January 1, 2005, through March 31, 2007. The authors evaluated the use of seasonal autoregressive integrated moving average, time series regression, exponential smoothing, and artificial neural network models to forecast daily patient volumes at each facility. Forecasts were made for horizons ranging from 1 to 30 days in advance. The forecast accuracy achieved by the various forecasting methods was compared to the forecast accuracy achieved when using a benchmark forecasting method already available in the emergency medicine literature. All time series methods considered in this analysis provided improved in-sample model goodness of fit. However, post-sample analysis revealed that time series regression models that augment linear regression models by accounting for serial autocorrelation offered only small improvements in terms of post-sample forecast accuracy, relative to multiple linear regression models, while seasonal autoregressive integrated moving average, exponential smoothing, and artificial neural network forecasting models did not provide consistently accurate forecasts of daily ED volumes. This study confirms the widely held belief that daily demand for ED services is characterized by seasonal and weekly patterns. The authors compared several time series forecasting methods to a benchmark multiple linear regression model. The results suggest that the existing methodology proposed in the literature, multiple linear regression based on calendar variables, is a reasonable approach to forecasting daily patient volumes in the ED. However, the authors conclude that regression-based models that incorporate calendar variables, account for site-specific special-day effects, and allow for residual autocorrelation provide a more appropriate, informative, and consistently accurate approach to forecasting daily ED patient volumes.
Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Ebhuoma, Osadolor; Gebreslasie, Michael
2016-06-14
Malaria is a serious public health threat in Sub-Saharan Africa (SSA), and its transmission risk varies geographically. Modelling its geographic characteristics is essential for identifying the spatial and temporal risk of malaria transmission. Remote sensing (RS) has been serving as an important tool in providing and assessing a variety of potential climatic/environmental malaria transmission variables in diverse areas. This review focuses on the utilization of RS-driven climatic/environmental variables in determining malaria transmission in SSA. A systematic search on Google Scholar and the Institute for Scientific Information (ISI) Web of Knowledge(SM) databases (PubMed, Web of Science and ScienceDirect) was carried out. We identified thirty-five peer-reviewed articles that studied the relationship between remotely-sensed climatic variable(s) and malaria epidemiological data in the SSA sub-regions. The relationship between malaria disease and different climatic/environmental proxies was examined using different statistical methods. Across the SSA sub-region, the normalized difference vegetation index (NDVI) derived from either the National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) or Moderate-resolution Imaging Spectrometer (MODIS) satellite sensors was most frequently returned as a statistically-significant variable to model both spatial and temporal malaria transmission. Furthermore, generalized linear models (linear regression, logistic regression and Poisson regression) were the most frequently-employed methods of statistical analysis in determining malaria transmission predictors in East, Southern and West Africa. By contrast, multivariate analysis was used in Central Africa. We stress that the utilization of RS in determining reliable malaria transmission predictors and climatic/environmental monitoring variables would require a tailored approach that will have cognizance of the geographical/climatic setting, the stage of malaria elimination continuum, the characteristics of the RS variables and the analytical approach, which in turn, would support the channeling of intervention resources sustainably.
Upper extremity disorders in heavy industry workers in Greece.
Tsouvaltzidou, Thomaella; Alexopoulos, Evangelos; Fragkakis, Ioannis; Jelastopulu, Eleni
2017-06-18
To investigate the disability due to musculoskeletal disorders of the upper extremities in heavy industry workers. The population under study consisted of 802 employees, both white- and blue-collar, working in a shipyard industry in Athens, Greece. Data were collected through the distribution of questionnaires and the recording of individual and job-related characteristics during the period 2006-2009. The questionnaires used were the Quick Disabilities of the Arm, Shoulder and Hand (QD) Outcome Measure, the Work Ability Index (WAI) and the Short-Form-36 (SF-36) Health Survey. The QD was divided into three parameters - movement restrictions in everyday activities, work and sports/music activities - and the SF-36 into two items, physical and emotional. Multiple linear regression analysis was performed by means of the SPSS v.22 for Windows Statistical Package. The answers given by the participants for the QD did not reveal great discomfort regarding the execution of manual tasks, with the majority of the participants scoring under 5%, meaning no disability. After conducting multiple linear regression, age revealed a positive association with the parameter of restrictions in everyday activities (b = 0.64, P = 0.000). Basic education showed a statistically significant association regarding restrictions during leisure activities, with b = 2.140 ( P = 0.029) for compulsory education graduates. WAI's final score displayed negative charging in the regression analysis of all three parameters, with b = -0.142 ( P = 0.0), b = -0.099 ( P = 0.055) and b = -0.376 ( P = 0.001) respectively, while the physical and emotional components of SF-36 associated with movement restrictions only in daily activities and work. The participants' specialty made no statistically significant associations with any of the three parameters of the QD. Increased musculoskeletal disorders of the upper extremity are associated with older age, lower basic education and physical and mental/emotional health and reduced working ability.
Ebhuoma, Osadolor; Gebreslasie, Michael
2016-01-01
Malaria is a serious public health threat in Sub-Saharan Africa (SSA), and its transmission risk varies geographically. Modelling its geographic characteristics is essential for identifying the spatial and temporal risk of malaria transmission. Remote sensing (RS) has been serving as an important tool in providing and assessing a variety of potential climatic/environmental malaria transmission variables in diverse areas. This review focuses on the utilization of RS-driven climatic/environmental variables in determining malaria transmission in SSA. A systematic search on Google Scholar and the Institute for Scientific Information (ISI) Web of KnowledgeSM databases (PubMed, Web of Science and ScienceDirect) was carried out. We identified thirty-five peer-reviewed articles that studied the relationship between remotely-sensed climatic variable(s) and malaria epidemiological data in the SSA sub-regions. The relationship between malaria disease and different climatic/environmental proxies was examined using different statistical methods. Across the SSA sub-region, the normalized difference vegetation index (NDVI) derived from either the National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) or Moderate-resolution Imaging Spectrometer (MODIS) satellite sensors was most frequently returned as a statistically-significant variable to model both spatial and temporal malaria transmission. Furthermore, generalized linear models (linear regression, logistic regression and Poisson regression) were the most frequently-employed methods of statistical analysis in determining malaria transmission predictors in East, Southern and West Africa. By contrast, multivariate analysis was used in Central Africa. We stress that the utilization of RS in determining reliable malaria transmission predictors and climatic/environmental monitoring variables would require a tailored approach that will have cognizance of the geographical/climatic setting, the stage of malaria elimination continuum, the characteristics of the RS variables and the analytical approach, which in turn, would support the channeling of intervention resources sustainably. PMID:27314369
Nejaim, Yuri; Aps, Johan K M; Groppo, Francisco Carlos; Haiter Neto, Francisco
2018-06-01
The purpose of this article was to evaluate the pharyngeal space volume, and the size and shape of the mandible and the hyoid bone, as well as their relationships, in patients with different facial types and skeletal classes. Furthermore, we estimated the volume of the pharyngeal space with a formula using only linear measurements. A total of 161 i-CAT Next Generation (Imaging Sciences International, Hatfield, Pa) cone-beam computed tomography images (80 men, 81 women; ages, 21-58 years; mean age, 27 years) were retrospectively studied. Skeletal class and facial type were determined for each patient from multiplanar reconstructions using the NemoCeph software (Nemotec, Madrid, Spain). Linear and angular measurements were performed using 3D imaging software (version 3.4.3; Carestream Health, Rochester, NY), and volumetric analysis of the pharyngeal space was carried out with ITK-SNAP (version 2.4.0; Cognitica, Philadelphia, Pa) segmentation software. For the statistics, analysis of variance and the Tukey test with a significance level of 0.05, Pearson correlation, and linear regression were used. The pharyngeal space volume, when correlated with mandible and hyoid bone linear and angular measurements, showed significant correlations with skeletal class or facial type. The linear regression performed to estimate the volume of the pharyngeal space showed an R of 0.92 and an adjusted R 2 of 0.8362. There were significant correlations between pharyngeal space volume, and the mandible and hyoid bone measurements, suggesting that the stomatognathic system should be evaluated in an integral and nonindividualized way. Furthermore, it was possible to develop a linear regression model, resulting in a useful formula for estimating the volume of the pharyngeal space. Copyright © 2018 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
Van de Voorde, Tim; Vlaeminck, Jeroen; Canters, Frank
2008-01-01
Urban growth and its related environmental problems call for sustainable urban management policies to safeguard the quality of urban environments. Vegetation plays an important part in this as it provides ecological, social, health and economic benefits to a city's inhabitants. Remotely sensed data are of great value to monitor urban green and despite the clear advantages of contemporary high resolution images, the benefits of medium resolution data should not be discarded. The objective of this research was to estimate fractional vegetation cover from a Landsat ETM+ image with sub-pixel classification, and to compare accuracies obtained with multiple stepwise regression analysis, linear spectral unmixing and multi-layer perceptrons (MLP) at the level of meaningful urban spatial entities. Despite the small, but nevertheless statistically significant differences at pixel level between the alternative approaches, the spatial pattern of vegetation cover and estimation errors is clearly distinctive at neighbourhood level. At this spatially aggregated level, a simple regression model appears to attain sufficient accuracy. For mapping at a spatially more detailed level, the MLP seems to be the most appropriate choice. Brightness normalisation only appeared to affect the linear models, especially the linear spectral unmixing. PMID:27879914
NASA Astrophysics Data System (ADS)
Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.
2016-01-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.
NASA Astrophysics Data System (ADS)
dos Santos, T. S.; Mendes, D.; Torres, R. R.
2015-08-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANN) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon, Northeastern Brazil and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model out- put and observed monthly precipitation. We used GCMs experiments for the 20th century (RCP Historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANN significantly outperforms the MLR downscaling of monthly precipitation variability.
Predicting major element mineral/melt equilibria - A statistical approach
NASA Technical Reports Server (NTRS)
Hostetler, C. J.; Drake, M. J.
1980-01-01
Empirical equations have been developed for calculating the mole fractions of NaO0.5, MgO, AlO1.5, SiO2, KO0.5, CaO, TiO2, and FeO in a solid phase of initially unknown identity given only the composition of the coexisting silicate melt. The approach involves a linear multivariate regression analysis in which solid composition is expressed as a Taylor series expansion of the liquid compositions. An internally consistent precision of approximately 0.94 is obtained, that is, the nature of the liquidus phase in the input data set can be correctly predicted for approximately 94% of the entries. The composition of the liquidus phase may be calculated to better than 5 mol % absolute. An important feature of this 'generalized solid' model is its reversibility; that is, the dependent and independent variables in the linear multivariate regression may be inverted to permit prediction of the composition of a silicate liquid produced by equilibrium partial melting of a polymineralic source assemblage.
Wang, Hai-Mei; Li, Zheng-Hai; Wang, Zhen
2013-01-01
Based on the monthly temperature and precipitation data of 15 meteorological stations and the statistical data of livestock density in Xilinguole League in 1981-2007, and by using ArcGIS, this paper analyzed the spatial distribution of the climate aridity and livestock density in the League, and in combining with the ten-day data of the normalized difference vegetation index (NDVI) in 1981-2007, the driving factors of the vegetation cover change in the League were discussed. In the study period, there was a satisfactory linear regression relationship between the climate aridity and the vegetation coverage. The NDVI and the livestock density had a favorable binomial regression relationship. With the increase of NDVI, the livestock density increased first and decreased then. The vegetation coverage had a complex linear relationship with livestock density and climate aridity. The NDVI had a positive correlation with climate aridity, but a negative correlation with livestock density. Compared with livestock density, climate aridity had far greater effects on the NDVI.
Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.
Ferrari, Alberto
2017-01-01
Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.
Linear and nonlinear methods in modeling the aqueous solubility of organic compounds.
Catana, Cornel; Gao, Hua; Orrenius, Christian; Stouten, Pieter F W
2005-01-01
Solubility data for 930 diverse compounds have been analyzed using linear Partial Least Square (PLS) and nonlinear PLS methods, Continuum Regression (CR), and Neural Networks (NN). 1D and 2D descriptors from MOE package in combination with E-state or ISIS keys have been used. The best model was obtained using linear PLS for a combination between 22 MOE descriptors and 65 ISIS keys. It has a correlation coefficient (r2) of 0.935 and a root-mean-square error (RMSE) of 0.468 log molar solubility (log S(w)). The model validated on a test set of 177 compounds not included in the training set has r2 0.911 and RMSE 0.475 log S(w). The descriptors were ranked according to their importance, and at the top of the list have been found the 22 MOE descriptors. The CR model produced results as good as PLS, and because of the way in which cross-validation has been done it is expected to be a valuable tool in prediction besides PLS model. The statistics obtained using nonlinear methods did not surpass those got with linear ones. The good statistic obtained for linear PLS and CR recommends these models to be used in prediction when it is difficult or impossible to make experimental measurements, for virtual screening, combinatorial library design, and efficient leads optimization.
Liu, Hongcheng; Yao, Tao; Li, Runze; Ye, Yinyu
2017-11-01
This paper concerns the folded concave penalized sparse linear regression (FCPSLR), a class of popular sparse recovery methods. Although FCPSLR yields desirable recovery performance when solved globally, computing a global solution is NP-complete. Despite some existing statistical performance analyses on local minimizers or on specific FCPSLR-based learning algorithms, it still remains open questions whether local solutions that are known to admit fully polynomial-time approximation schemes (FPTAS) may already be sufficient to ensure the statistical performance, and whether that statistical performance can be non-contingent on the specific designs of computing procedures. To address the questions, this paper presents the following threefold results: (i) Any local solution (stationary point) is a sparse estimator, under some conditions on the parameters of the folded concave penalties. (ii) Perhaps more importantly, any local solution satisfying a significant subspace second-order necessary condition (S 3 ONC), which is weaker than the second-order KKT condition, yields a bounded error in approximating the true parameter with high probability. In addition, if the minimal signal strength is sufficient, the S 3 ONC solution likely recovers the oracle solution. This result also explicates that the goal of improving the statistical performance is consistent with the optimization criteria of minimizing the suboptimality gap in solving the non-convex programming formulation of FCPSLR. (iii) We apply (ii) to the special case of FCPSLR with minimax concave penalty (MCP) and show that under the restricted eigenvalue condition, any S 3 ONC solution with a better objective value than the Lasso solution entails the strong oracle property. In addition, such a solution generates a model error (ME) comparable to the optimal but exponential-time sparse estimator given a sufficient sample size, while the worst-case ME is comparable to the Lasso in general. Furthermore, to guarantee the S 3 ONC admits FPTAS.
NASA Astrophysics Data System (ADS)
Barman, S.; Bhattacharjya, R. K.
2017-12-01
The River Subansiri is the major north bank tributary of river Brahmaputra. It originates from the range of Himalayas beyond the Great Himalayan range at an altitude of approximately 5340m. Subansiri basin extends from tropical to temperate zones and hence exhibits a great diversity in rainfall characteristics. In the Northern and Central Himalayan tracts, precipitation is scarce on account of high altitudes. On the other hand, Southeast part of the Subansiri basin comprising the sub-Himalayan and the plain tract in Arunachal Pradesh and Assam, lies in the tropics. Due to Northeast as well as Southwest monsoon, precipitation occurs in this region in abundant quantities. Particularly, Southwest monsoon causes very heavy precipitation in the entire Subansiri basin during May to October. In this study, the rainfall over Subansiri basin has been studied at 24 different locations by multiple linear and non-linear regression based statistical downscaling techniques and by Artificial Neural Network based model. APHRODITE's gridded rainfall data of 0.25˚ x 0.25˚ resolutions and climatic parameters of HadCM3 GCM of resolution 2.5˚ x 3.75˚ (latitude by longitude) have been used in this study. It has been found that multiple non-linear regression based statistical downscaling technique outperformed the other techniques. Using this method, the future rainfall pattern over the Subansiri basin has been analyzed up to the year 2099 for four different time periods, viz., 2020-39, 2040-59, 2060-79, and 2080-99 at all the 24 locations. On the basis of historical rainfall, the months have been categorized as wet months, months with moderate rainfall and dry months. The spatial changes in rainfall patterns for all these three types of months have also been analyzed over the basin. Potential decrease of rainfall in the wet months and months with moderate rainfall and increase of rainfall in the dry months are observed for the future rainfall pattern of the Subansiri basin.
Molla, Azaher Ali; Chi, Chunhuei; Mondaca, Alicia Lorena Núñez
2017-01-31
Predictors of high out-of-pocket household healthcare expenditure are essential for creating effective health system finance policy. In Bangladesh, 63.3% of health expenditure is out-of-pocket and born by households. It is imperative to know what determines household health expenditure. This study aims to investigate the predicting factors of high out-of-pocket household healthcare expenditure targeting to put forward policy recommendations on equity in financial burden. Bangladesh household income and expenditure survey 2010 provides data for this study. Predictors of high out-of-pocket household healthcare expenditure were analyzed using multiple linear regressions. We have modeled non-linear relationship using logarithmic form of linear regression. Heteroscedasticity and multicollinearity were checked using Breusch-Pagan/Cook-Weishberg and VIF tests. Normality of the residuals was checked using Kernel density curve. We applied required adjustment for survey data, so that standard errors and parameters estimation are valid. Presence of chronic disease and household income were found to be the most influential and statistically significant (p < 0.001) predictors of high household healthcare expenditure. Households in rural areas spend 7% less than urban dwellers. The results show that a 100% increase in female members in a family leads to a 2% decrease in household health expenditure. Household income, health shocks in families, and family size are other statistically significant predictors of household healthcare expenditure. Proportion of elderly and under-five members in the family show some positive influence on health expenditure, though statistically nonsignificant. The findings call for emphasizing prevention of chronic diseases, as it is a strong predictor of household health expenditure. Innovative insurance scheme needs to be devised to prevent household from being impoverished due to health shocks in the family. Policy makers are urged to design an alternative source of healthcare financing in Bangladesh to minimize the burden of high OOP healthcare expenditure.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.
NASA Astrophysics Data System (ADS)
Jung, Jae Hong; Jung, Joo-Young; Cho, Kwang Hwan; Ryu, Mi Ryeong; Bae, Sun Hyun; Moon, Seong Kwon; Kim, Yong Ho; Choe, Bo-Young; Suh, Tae Suk
2017-02-01
The purpose of this study was to analyze the glottis rotational error (GRE) by using a thermoplastic mask for patients with the glottic cancer undergoing intensity-modulated radiation therapy (IMRT). We selected 20 patients with glottic cancer who had received IMRT by using the tomotherapy. The image modalities with both kilovoltage computed tomography (planning kVCT) and megavoltage CT (daily MVCT) images were used for evaluating the error. Six anatomical landmarks in the image were defined to evaluate a correlation between the absolute GRE (°) and the length of contact with the underlying skin of the patient by the mask (mask, mm). We also statistically analyzed the results by using the Pearson's correlation coefficient and a linear regression analysis ( P <0.05). The mask and the absolute GRE were verified to have a statistical correlation ( P < 0.01). We found a statistical significance for each parameter in the linear regression analysis (mask versus absolute roll: P = 0.004 [ P < 0.05]; mask versus 3D-error: P = 0.000 [ P < 0.05]). The range of the 3D-errors with contact by the mask was from 1.2% - 39.7% between the maximumand no-contact case in this study. A thermoplastic mask with a tight, increased contact area may possibly contribute to the uncertainty of the reproducibility as a variation of the absolute GRE. Thus, we suggest that a modified mask, such as one that covers only the glottis area, can significantly reduce the patients' setup errors during the treatment.
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Penna, M.L.; Duchiade, M.P.
The authors report the results of an investigation into the possible association between air pollution and infant mortality from pneumonia in the Rio de Janeiro Metropolitan Area. This investigation employed multiple linear regression analysis (stepwise method) for infant mortality from pneumonia in 1980, including the study population's areas of residence, incomes, and pollution exposure as independent variables. With the income variable included in the regression, a statistically significant association was observed between the average annual level of particulates and infant mortality from pneumonia. While this finding should be accepted with caution, it does suggest a biological association between these variables.more » The authors' conclusion is that air quality indicators should be included in studies of acute respiratory infections in developing countries.« less
Arsenyev, P A; Trezvov, V V; Saratovskaya, N V
1997-01-01
This work represents a method, which allows to determine phase composition of calcium hydroxylapatite basing on its infrared spectrum. The method uses factor analysis of the spectral data of calibration set of samples to determine minimal number of factors required to reproduce the spectra within experimental error. Multiple linear regression is applied to establish correlation between factor scores of calibration standards and their properties. The regression equations can be used to predict the property value of unknown sample. The regression model was built for determination of beta-tricalcium phosphate content in hydroxylapatite. Statistical estimation of quality of the model was carried out. Application of the factor analysis on spectral data allows to increase accuracy of beta-tricalcium phosphate determination and expand the range of determination towards its less concentration. Reproducibility of results is retained.
Marginal regression analysis of recurrent events with coarsened censoring times.
Hu, X Joan; Rosychuk, Rhonda J
2016-12-01
Motivated by an ongoing pediatric mental health care (PMHC) study, this article presents weakly structured methods for analyzing doubly censored recurrent event data where only coarsened information on censoring is available. The study extracted administrative records of emergency department visits from provincial health administrative databases. The available information of each individual subject is limited to a subject-specific time window determined up to concealed data. To evaluate time-dependent effect of exposures, we adapt the local linear estimation with right censored survival times under the Cox regression model with time-varying coefficients (cf. Cai and Sun, Scandinavian Journal of Statistics 2003, 30, 93-111). We establish the pointwise consistency and asymptotic normality of the regression parameter estimator, and examine its performance by simulation. The PMHC study illustrates the proposed approach throughout the article. © 2016, The International Biometric Society.
Linear regression crash prediction models : issues and proposed solutions.
DOT National Transportation Integrated Search
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Statistical and Biophysical Models for Predicting Total and Outdoor Water Use in Los Angeles
NASA Astrophysics Data System (ADS)
Mini, C.; Hogue, T. S.; Pincetl, S.
2012-04-01
Modeling water demand is a complex exercise in the choice of the functional form, techniques and variables to integrate in the model. The goal of the current research is to identify the determinants that control total and outdoor residential water use in semi-arid cities and to utilize that information in the development of statistical and biophysical models that can forecast spatial and temporal urban water use. The City of Los Angeles is unique in its highly diverse socio-demographic, economic and cultural characteristics across neighborhoods, which introduces significant challenges in modeling water use. Increasing climate variability also contributes to uncertainties in water use predictions in urban areas. Monthly individual water use records were acquired from the Los Angeles Department of Water and Power (LADWP) for the 2000 to 2010 period. Study predictors of residential water use include socio-demographic, economic, climate and landscaping variables at the zip code level collected from US Census database. Climate variables are estimated from ground-based observations and calculated at the centroid of each zip code by inverse-distance weighting method. Remotely-sensed products of vegetation biomass and landscape land cover are also utilized. Two linear regression models were developed based on the panel data and variables described: a pooled-OLS regression model and a linear mixed effects model. Both models show income per capita and the percentage of landscape areas in each zip code as being statistically significant predictors. The pooled-OLS model tends to over-estimate higher water use zip codes and both models provide similar RMSE values.Outdoor water use was estimated at the census tract level as the residual between total water use and indoor use. This residual is being compared with the output from a biophysical model including tree and grass cover areas, climate variables and estimates of evapotranspiration at very high spatial resolution. A genetic algorithm based model (Shuffled Complex Evolution-UA; SCE-UA) is also being developed to provide estimates of the predictions and parameters uncertainties and to compare against the linear regression models. Ultimately, models will be selected to undertake predictions for a range of climate change and landscape scenarios. Finally, project results will contribute to a better understanding of water demand to help predict future water use and implement targeted landscaping conservation programs to maintain sustainable water needs for a growing population under uncertain climate variability.
Experimental statistics for biological sciences.
Bang, Heejung; Davidian, Marie
2010-01-01
In this chapter, we cover basic and fundamental principles and methods in statistics - from "What are Data and Statistics?" to "ANOVA and linear regression," which are the basis of any statistical thinking and undertaking. Readers can easily find the selected topics in most introductory statistics textbooks, but we have tried to assemble and structure them in a succinct and reader-friendly manner in a stand-alone chapter. This text has long been used in real classroom settings for both undergraduate and graduate students who do or do not major in statistical sciences. We hope that from this chapter, readers would understand the key statistical concepts and terminologies, how to design a study (experimental or observational), how to analyze the data (e.g., describe the data and/or estimate the parameter(s) and make inference), and how to interpret the results. This text would be most useful if it is used as a supplemental material, while the readers take their own statistical courses or it would serve as a great reference text associated with a manual for any statistical software as a self-teaching guide.
Methods for estimating low-flow statistics for Massachusetts streams
Ries, Kernell G.; Friesz, Paul J.
2000-01-01
Methods and computer software are described in this report for determining flow duration, low-flow frequency statistics, and August median flows. These low-flow statistics can be estimated for unregulated streams in Massachusetts using different methods depending on whether the location of interest is at a streamgaging station, a low-flow partial-record station, or an ungaged site where no data are available. Low-flow statistics for streamgaging stations can be estimated using standard U.S. Geological Survey methods described in the report. The MOVE.1 mathematical method and a graphical correlation method can be used to estimate low-flow statistics for low-flow partial-record stations. The MOVE.1 method is recommended when the relation between measured flows at a partial-record station and daily mean flows at a nearby, hydrologically similar streamgaging station is linear, and the graphical method is recommended when the relation is curved. Equations are presented for computing the variance and equivalent years of record for estimates of low-flow statistics for low-flow partial-record stations when either a single or multiple index stations are used to determine the estimates. The drainage-area ratio method or regression equations can be used to estimate low-flow statistics for ungaged sites where no data are available. The drainage-area ratio method is generally as accurate as or more accurate than regression estimates when the drainage-area ratio for an ungaged site is between 0.3 and 1.5 times the drainage area of the index data-collection site. Regression equations were developed to estimate the natural, long-term 99-, 98-, 95-, 90-, 85-, 80-, 75-, 70-, 60-, and 50-percent duration flows; the 7-day, 2-year and the 7-day, 10-year low flows; and the August median flow for ungaged sites in Massachusetts. Streamflow statistics and basin characteristics for 87 to 133 streamgaging stations and low-flow partial-record stations were used to develop the equations. The streamgaging stations had from 2 to 81 years of record, with a mean record length of 37 years. The low-flow partial-record stations had from 8 to 36 streamflow measurements, with a median of 14 measurements. All basin characteristics were determined from digital map data. The basin characteristics that were statistically significant in most of the final regression equations were drainage area, the area of stratified-drift deposits per unit of stream length plus 0.1, mean basin slope, and an indicator variable that was 0 in the eastern region and 1 in the western region of Massachusetts. The equations were developed by use of weighted-least-squares regression analyses, with weights assigned proportional to the years of record and inversely proportional to the variances of the streamflow statistics for the stations. Standard errors of prediction ranged from 70.7 to 17.5 percent for the equations to predict the 7-day, 10-year low flow and 50-percent duration flow, respectively. The equations are not applicable for use in the Southeast Coastal region of the State, or where basin characteristics for the selected ungaged site are outside the ranges of those for the stations used in the regression analyses. A World Wide Web application was developed that provides streamflow statistics for data collection stations from a data base and for ungaged sites by measuring the necessary basin characteristics for the site and solving the regression equations. Output provided by the Web application for ungaged sites includes a map of the drainage-basin boundary determined for the site, the measured basin characteristics, the estimated streamflow statistics, and 90-percent prediction intervals for the estimates. An equation is provided for combining regression and correlation estimates to obtain improved estimates of the streamflow statistics for low-flow partial-record stations. An equation is also provided for combining regression and drainage-area ratio estimates to obtain improved e
NASA Astrophysics Data System (ADS)
Kumar, Vandhna; Meyssignac, Benoit; Melet, Angélique; Ganachaud, Alexandre
2017-04-01
Rising sea levels are a critical concern in small island nations. The problem is especially serious in the western south Pacific, where the total sea level rise over the last 60 years is up to 3 times the global average. In this study, we attempt to reconstruct sea levels at selected sites in the region (Suva, Lautoka, Noumea - Fiji and New Caledonia) as a mutiple-linear regression of atmospheric and oceanic variables. We focus on interannual-to-decadal scale variability, and lower (including the global mean sea level rise) over the 1979-2014 period. Sea levels are taken from tide gauge records and the ORAS4 reanalysis dataset, and are expressed as a sum of steric and mass changes as a preliminary step. The key development in our methodology is using leading wind stress curl as a proxy for the thermosteric component. This is based on the knowledge that wind stress curl anomalies can modulate the thermocline depth and resultant sea levels via Rossby wave propagation. The analysis is primarily based on correlation between local sea level and selected predictors, the dominant one being wind stress curl. In the first step, proxy boxes for wind stress curl are determined via regions of highest correlation. The proportion of sea level explained via linear regression is then removed, leaving a residual. This residual is then correlated with other locally acting potential predictors: halosteric sea level, the zonal and meridional wind stress components, and sea surface temperature. The statistically significant predictors are used in a multi-linear regression function to simulate the observed sea level. The method is able to reproduce between 40 to 80% of the variance in observed sea level. Based on the skill of the model, it has high potential in sea level projection and downscaling studies.
NASA Astrophysics Data System (ADS)
Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari
2018-01-01
Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg/day) in Jenderan catchment area.
Salas, Eric Ariel L; Valdez, Raul; Michel, Stefan
2017-11-01
We modeled summer and winter habitat suitability of Marco Polo argali in the Pamir Mountains in southeastern Tajikistan using these statistical algorithms: Generalized Linear Model, Random Forest, Boosted Regression Tree, Maxent, and Multivariate Adaptive Regression Splines. Using sheep occurrence data collected from 2009 to 2015 and a set of selected habitat predictors, we produced summer and winter habitat suitability maps and determined the important habitat suitability predictors for both seasons. Our results demonstrated that argali selected proximity to riparian areas and greenness as the two most relevant variables for summer, and the degree of slope (gentler slopes between 0° to 20°) and Landsat temperature band for winter. The terrain roughness was also among the most important variables in summer and winter models. Aspect was only significant for winter habitat, with argali preferring south-facing mountain slopes. We evaluated various measures of model performance such as the Area Under the Curve (AUC) and the True Skill Statistic (TSS). Comparing the five algorithms, the AUC scored highest for Boosted Regression Tree in summer (AUC = 0.94) and winter model runs (AUC = 0.94). In contrast, Random Forest underperformed in both model runs.
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.
The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?
Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J; Ma, Keping
2013-01-01
Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.
On the Stationarity of Multiple Autoregressive Approximants: Theory and Algorithms
1976-08-01
a I (3.4) Hannan and Terrell (1972) consider problems of a similar nature. Efficient estimates A(1),... , A(p) , and i of A(1)... ,A(p) and...34Autoregressive model fitting for control, Ann . Inst. Statist. Math., 23, 163-180. Hannan, E. J. (1970), Multiple Time Series, New York, John Wiley...Hannan, E. J. and Terrell , R. D. (1972), "Time series regression with linear constraints, " International Economic Review, 13, 189-200. Masani, P
NASA Astrophysics Data System (ADS)
Grotti, Marco; Abelmoschi, Maria Luisa; Soggia, Francesco; Tiberiade, Christian; Frache, Roberto
2000-12-01
The multivariate effects of Na, K, Mg and Ca as nitrates on the electrothermal atomisation of manganese, cadmium and iron were studied by multiple linear regression modelling. Since the models proved to efficiently predict the effects of the considered matrix elements in a wide range of concentrations, they were applied to correct the interferences occurring in the determination of trace elements in seawater after pre-concentration of the analytes. In order to obtain a statistically significant number of samples, a large volume of the certified seawater reference materials CASS-3 and NASS-3 was treated with Chelex-100 resin; then, the chelating resin was separated from the solution, divided into several sub-samples, each of them was eluted with nitric acid and analysed by electrothermal atomic absorption spectrometry (for trace element determinations) and inductively coupled plasma optical emission spectrometry (for matrix element determinations). To minimise any other systematic error besides that due to matrix effects, accuracy of the pre-concentration step and contamination levels of the procedure were checked by inductively coupled plasma mass spectrometric measurements. Analytical results obtained by applying the multiple linear regression models were compared with those obtained with other calibration methods, such as external calibration using acid-based standards, external calibration using matrix-matched standards and the analyte addition technique. Empirical models proved to efficiently reduce interferences occurring in the analysis of real samples, allowing an improvement of accuracy better than for other calibration methods.
Factors contributing to tooth loss among the elderly: A cross sectional study.
Natto, Zuhair S; Aladmawy, Majdi; Alasqah, Mohammed; Papas, Athena
2014-12-01
The present study evaluates the influence of several demographic, health, personal, and clinical factors on the number of missing teeth in old age sample. The number of patients included was 259; they received a full mouth examination and answered a questionnaire provided by one examiner. All the variables related to teeth loss based on the literature were included. These variables focused on age, gender, race, marital status, clinical attachment level, pocket depth, year of smoking, number of cigarettes smoked per day, number of medications, root decay, coronal decay, health status, and year of education. Statistical analysis involved stepwise multivariate linear regression. Teeth loss was statistically associated with clinical attachment level (CAL)(p value 0.0001), pocket depth (PD) (0.0007) and education level (0.0048). When smoking was included in the model, age was significantly associated with teeth loss (0.0037). At least one of these four factors was also related to teeth loss in several specific groups such as diabetes mellitus, male, and White. The multiple linear regressions for all the proposed variables showed that they contributed to teeth loss by about 23%. It can be concluded that less education or increased clinical attachment level loss may increase number of missing teeth. Additionally, age may cause teeth loss in the presence of smoking. Copyright © 2014. Published by Elsevier B.V.
NASA Astrophysics Data System (ADS)
García-Díaz, J. Carlos
2009-11-01
Fault detection and diagnosis is an important problem in process engineering. Process equipments are subject to malfunctions during operation. Galvanized steel is a value added product, furnishing effective performance by combining the corrosion resistance of zinc with the strength and formability of steel. Fault detection and diagnosis is an important problem in continuous hot dip galvanizing and the increasingly stringent quality requirements in automotive industry has also demanded ongoing efforts in process control to make the process more robust. When faults occur, they change the relationship among these observed variables. This work compares different statistical regression models proposed in the literature for estimating the quality of galvanized steel coils on the basis of short time histories. Data for 26 batches were available. Five variables were selected for monitoring the process: the steel strip velocity, four bath temperatures and bath level. The entire data consisting of 48 galvanized steel coils was divided into sets. The first training data set was 25 conforming coils and the second data set was 23 nonconforming coils. Logistic regression is a modeling tool in which the dependent variable is categorical. In most applications, the dependent variable is binary. The results show that the logistic generalized linear models do provide good estimates of quality coils and can be useful for quality control in manufacturing process.
Rostami, Reza; Sadeghi, Vahid; Zarei, Jamileh; Haddadi, Parvaneh; Mohazzab-Torabi, Saman; Salamati, Payman
2013-04-01
The aim of this study was to compare the Persian version of the wechsler intelligence scale for children - fourth edition (WISC-IV) and cognitive assessment system (CAS) tests, to determine the correlation between their scales and to evaluate the probable concurrent validity of these tests in patients with learning disorders. One-hundered-sixty-two children with learning disorder who were presented at Atieh Comprehensive Psychiatry Center were selected in a consecutive non-randomized order. All of the patients were assessed based on WISC-IV and CAS scores questionnaires. Pearson correlation coefficient was used to analyze the correlation between the data and to assess the concurrent validity of the two tests. Linear regression was used for statistical modeling. The type one error was considered 5% in maximum. There was a strong correlation between total score of WISC-IV test and total score of CAS test in the patients (r=0.75, P<0.001). The correlations among the other scales were mostly high and all of them were statistically significant (P<0.001). A linear regression model was obtained (α = 0.51, β = 0.81 and P<0.001). There is an acceptable correlation between the WISC-IV scales and CAS test in children with learning disorders. A concurrent validity is established between the two tests and their scales.
Durand, Casey P
2013-01-01
Statistical interactions are a common component of data analysis across a broad range of scientific disciplines. However, the statistical power to detect interactions is often undesirably low. One solution is to elevate the Type 1 error rate so that important interactions are not missed in a low power situation. To date, no study has quantified the effects of this practice on power in a linear regression model. A Monte Carlo simulation study was performed. A continuous dependent variable was specified, along with three types of interactions: continuous variable by continuous variable; continuous by dichotomous; and dichotomous by dichotomous. For each of the three scenarios, the interaction effect sizes, sample sizes, and Type 1 error rate were varied, resulting in a total of 240 unique simulations. In general, power to detect the interaction effect was either so low or so high at α = 0.05 that raising the Type 1 error rate only served to increase the probability of including a spurious interaction in the model. A small number of scenarios were identified in which an elevated Type 1 error rate may be justified. Routinely elevating Type 1 error rate when testing interaction effects is not an advisable practice. Researchers are best served by positing interaction effects a priori and accounting for them when conducting sample size calculations.
Analysis of methods to estimate spring flows in a karst aquifer
Sepulveda, N.
2009-01-01
Hydraulically and statistically based methods were analyzed to identify the most reliable method to predict spring flows in a karst aquifer. Measured water levels at nearby observation wells, measured spring pool altitudes, and the distance between observation wells and the spring pool were the parameters used to match measured spring flows. Measured spring flows at six Upper Floridan aquifer springs in central Florida were used to assess the reliability of these methods to predict spring flows. Hydraulically based methods involved the application of the Theis, Hantush-Jacob, and Darcy-Weisbach equations, whereas the statistically based methods were the multiple linear regressions and the technology of artificial neural networks (ANNs). Root mean square errors between measured and predicted spring flows using the Darcy-Weisbach method ranged between 5% and 15% of the measured flows, lower than the 7% to 27% range for the Theis or Hantush-Jacob methods. Flows at all springs were estimated to be turbulent based on the Reynolds number derived from the Darcy-Weisbach equation for conduit flow. The multiple linear regression and the Darcy-Weisbach methods had similar spring flow prediction capabilities. The ANNs provided the lowest residuals between measured and predicted spring flows, ranging from 1.6% to 5.3% of the measured flows. The model prediction efficiency criteria also indicated that the ANNs were the most accurate method predicting spring flows in a karst aquifer. ?? 2008 National Ground Water Association.
Analysis of methods to estimate spring flows in a karst aquifer.
Sepúlveda, Nicasio
2009-01-01
Hydraulically and statistically based methods were analyzed to identify the most reliable method to predict spring flows in a karst aquifer. Measured water levels at nearby observation wells, measured spring pool altitudes, and the distance between observation wells and the spring pool were the parameters used to match measured spring flows. Measured spring flows at six Upper Floridan aquifer springs in central Florida were used to assess the reliability of these methods to predict spring flows. Hydraulically based methods involved the application of the Theis, Hantush-Jacob, and Darcy-Weisbach equations, whereas the statistically based methods were the multiple linear regressions and the technology of artificial neural networks (ANNs). Root mean square errors between measured and predicted spring flows using the Darcy-Weisbach method ranged between 5% and 15% of the measured flows, lower than the 7% to 27% range for the Theis or Hantush-Jacob methods. Flows at all springs were estimated to be turbulent based on the Reynolds number derived from the Darcy-Weisbach equation for conduit flow. The multiple linear regression and the Darcy-Weisbach methods had similar spring flow prediction capabilities. The ANNs provided the lowest residuals between measured and predicted spring flows, ranging from 1.6% to 5.3% of the measured flows. The model prediction efficiency criteria also indicated that the ANNs were the most accurate method predicting spring flows in a karst aquifer.
Rostami, Reza; Sadeghi, Vahid; Zarei, Jamileh; Haddadi, Parvaneh; Mohazzab-Torabi, Saman; Salamati, Payman
2013-01-01
Objective The aim of this study was to compare the Persian version of the wechsler intelligence scale for children - fourth edition (WISC-IV) and cognitive assessment system (CAS) tests, to determine the correlation between their scales and to evaluate the probable concurrent validity of these tests in patients with learning disorders. Methods One-hundered-sixty-two children with learning disorder who were presented at Atieh Comprehensive Psychiatry Center were selected in a consecutive non-randomized order. All of the patients were assessed based on WISC-IV and CAS scores questionnaires. Pearson correlation coefficient was used to analyze the correlation between the data and to assess the concurrent validity of the two tests. Linear regression was used for statistical modeling. The type one error was considered 5% in maximum. Findings There was a strong correlation between total score of WISC-IV test and total score of CAS test in the patients (r=0.75, P<0.001). The correlations among the other scales were mostly high and all of them were statistically significant (P<0.001). A linear regression model was obtained (α = 0.51, β = 0.81 and P<0.001). Conclusion There is an acceptable correlation between the WISC-IV scales and CAS test in children with learning disorders. A concurrent validity is established between the two tests and their scales. PMID:23724180
Effects of metal- and fiber-reinforced composite root canal posts on flexural properties.
Kim, Su-Hyeon; Oh, Tack-Oon; Kim, Ju-Young; Park, Chun-Woong; Baek, Seung-Ho; Park, Eun-Seok
2016-01-01
The aim of this study was to observe the effects of different test conditions on the flexural properties of root canal post. Metal- and fiber-reinforced composite root canal posts of various diameters were measured to determine flexural properties using a threepoint bending test at different conditions. In this study, the span length/post diameter ratio of root canal posts varied from 3.0 to 10.0. Multiple regression models for maximum load as a dependent variable were statistically significant. The models for flexural properties as dependent variables were statistically significant, but linear regression models could not be fitted to data sets. At a low span length/post diameter ratio, the flexural properties were distorted by occurrence of shear stress in short samples. It was impossible to obtain high span length/post diameter ratio with root canal posts. The addition of parameters or coefficients is necessary to appropriately represent the flexural properties of root canal posts.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
NASA Astrophysics Data System (ADS)
Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin
2016-03-01
How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.
Body mass index in relation to serum prostate-specific antigen levels and prostate cancer risk.
Bonn, Stephanie E; Sjölander, Arvid; Tillander, Annika; Wiklund, Fredrik; Grönberg, Henrik; Bälter, Katarina
2016-07-01
High Body mass index (BMI) has been directly associated with risk of aggressive or fatal prostate cancer. One possible explanation may be an effect of BMI on serum levels of prostate-specific antigen (PSA). To study the association between BMI and serum PSA as well as prostate cancer risk, a large cohort of men without prostate cancer at baseline was followed prospectively for prostate cancer diagnoses until 2015. Serum PSA and BMI were assessed among 15,827 men at baseline in 2010-2012. During follow-up, 735 men were diagnosed with prostate cancer with 282 (38.4%) classified as high-grade cancers. Multivariable linear regression models and natural cubic linear regression splines were fitted for analyses of BMI and log-PSA. For risk analysis, Cox proportional hazards regression models were used to estimate hazard ratios (HR) and 95% confidence intervals (CI) and natural cubic Cox regression splines producing standardized cancer-free probabilities were fitted. Results showed that baseline Serum PSA decreased by 1.6% (95% CI: -2.1 to -1.1) with every one unit increase in BMI. Statistically significant decreases of 3.7, 11.7 and 32.3% were seen for increasing BMI-categories of 25 < 30, 30 < 35 and ≥35 kg/m(2), respectively, compared to the reference (18.5 < 25 kg/m(2)). No statistically significant associations were seen between BMI and prostate cancer risk although results were indicative of a positive association to incidence rates of high-grade disease and an inverse association to incidence of low-grade disease. However, findings regarding risk are limited by the short follow-up time. In conclusion, BMI was inversely associated to PSA-levels. BMI should be taken into consideration when referring men to a prostate biopsy based on serum PSA-levels. © 2016 UICC.
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Ortiz, M.; Graber, H. C.; Wilkinson, J.; Nyman, L. M.; Lund, B.
2017-12-01
Much work has been done on determining changes in summer ice albedo and morphological properties of melt ponds such as depth, shape and distribution using in-situ measurements and satellite-based sensors. Although these studies have dedicated much pioneering work in this area, there still lacks sufficient spatial and temporal scales. We present a prototype algorithm using Linear Support Vector Machines (LSVMs) designed to quantify the evolution of melt pond fraction from a recently government-declassified high-resolution panchromatic optical dataset. The study area of interest lies within the Beaufort marginal ice zone (MIZ), where several in-situ instruments were deployed by the British Antarctic Survey in joint with the MIZ Program, from April-September, 2014. The LSVM uses four dimensional feature data from the intensity image itself, and from various textures calculated from a modified first-order histogram technique using probability density of occurrences. We explore both the temporal evolution of melt ponds and spatial statistics such as pond fraction, pond area, and number pond density, to name a few. We also introduce a linear regression model that can potentially be used to estimate average pond area by ingesting several melt pond statistics and shape parameters.
Modeling exposure–lag–response associations with distributed lag non-linear models
Gasparrini, Antonio
2014-01-01
In biomedical research, a health effect is frequently associated with protracted exposures of varying intensity sustained in the past. The main complexity of modeling and interpreting such phenomena lies in the additional temporal dimension needed to express the association, as the risk depends on both intensity and timing of past exposures. This type of dependency is defined here as exposure–lag–response association. In this contribution, I illustrate a general statistical framework for such associations, established through the extension of distributed lag non-linear models, originally developed in time series analysis. This modeling class is based on the definition of a cross-basis, obtained by the combination of two functions to flexibly model linear or nonlinear exposure-responses and the lag structure of the relationship, respectively. The methodology is illustrated with an example application to cohort data and validated through a simulation study. This modeling framework generalizes to various study designs and regression models, and can be applied to study the health effects of protracted exposures to environmental factors, drugs or carcinogenic agents, among others. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24027094
Kupek, Emil
2006-03-15
Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR) into Q-metric by (OR-1)/(OR+1) to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100) random sample of the data generated and on a real data set. SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
Van Looy, Stijn; Verplancke, Thierry; Benoit, Dominique; Hoste, Eric; Van Maele, Georges; De Turck, Filip; Decruyenaere, Johan
2007-01-01
Tacrolimus is an important immunosuppressive drug for organ transplantation patients. It has a narrow therapeutic range, toxic side effects, and a blood concentration with wide intra- and interindividual variability. Hence, it is of the utmost importance to monitor tacrolimus blood concentration, thereby ensuring clinical effect and avoiding toxic side effects. Prediction models for tacrolimus blood concentration can improve clinical care by optimizing monitoring of these concentrations, especially in the initial phase after transplantation during intensive care unit (ICU) stay. This is the first study in the ICU in which support vector machines, as a new data modeling technique, are investigated and tested in their prediction capabilities of tacrolimus blood concentration. Linear support vector regression (SVR) and nonlinear radial basis function (RBF) SVR are compared with multiple linear regression (MLR). Tacrolimus blood concentrations, together with 35 other relevant variables from 50 liver transplantation patients, were extracted from our ICU database. This resulted in a dataset of 457 blood samples, on average between 9 and 10 samples per patient, finally resulting in a database of more than 16,000 data values. Nonlinear RBF SVR, linear SVR, and MLR were performed after selection of clinically relevant input variables and model parameters. Differences between observed and predicted tacrolimus blood concentrations were calculated. Prediction accuracy of the three methods was compared after fivefold cross-validation (Friedman test and Wilcoxon signed rank analysis). Linear SVR and nonlinear RBF SVR had mean absolute differences between observed and predicted tacrolimus blood concentrations of 2.31 ng/ml (standard deviation [SD] 2.47) and 2.38 ng/ml (SD 2.49), respectively. MLR had a mean absolute difference of 2.73 ng/ml (SD 3.79). The difference between linear SVR and MLR was statistically significant (p < 0.001). RBF SVR had the advantage of requiring only 2 input variables to perform this prediction in comparison to 15 and 16 variables needed by linear SVR and MLR, respectively. This is an indication of the superior prediction capability of nonlinear SVR. Prediction of tacrolimus blood concentration with linear and nonlinear SVR was excellent, and accuracy was superior in comparison with an MLR model.
NASA Technical Reports Server (NTRS)
Ratnayake, Nalin A.; Koshimoto, Ed T.; Taylor, Brian R.
2011-01-01
The problem of parameter estimation on hybrid-wing-body type aircraft is complicated by the fact that many design candidates for such aircraft involve a large number of aero- dynamic control effectors that act in coplanar motion. This fact adds to the complexity already present in the parameter estimation problem for any aircraft with a closed-loop control system. Decorrelation of system inputs must be performed in order to ascertain individual surface derivatives with any sort of mathematical confidence. Non-standard control surface configurations, such as clamshell surfaces and drag-rudder modes, further complicate the modeling task. In this paper, asymmetric, single-surface maneuvers are used to excite multiple axes of aircraft motion simultaneously. Time history reconstructions of the moment coefficients computed by the solved regression models are then compared to each other in order to assess relative model accuracy. The reduced flight-test time required for inner surface parameter estimation using multi-axis methods was found to come at the cost of slightly reduced accuracy and statistical confidence for linear regression methods. Since the multi-axis maneuvers captured parameter estimates similar to both longitudinal and lateral-directional maneuvers combined, the number of test points required for the inner, aileron-like surfaces could in theory have been reduced by 50%. While trends were similar, however, individual parameters as estimated by a multi-axis model were typically different by an average absolute difference of roughly 15-20%, with decreased statistical significance, than those estimated by a single-axis model. The multi-axis model exhibited an increase in overall fit error of roughly 1-5% for the linear regression estimates with respect to the single-axis model, when applied to flight data designed for each, respectively.
Wagner, Brian J.; Gorelick, Steven M.
1986-01-01
A simulation nonlinear multiple-regression methodology for estimating parameters that characterize the transport of contaminants is developed and demonstrated. Finite difference contaminant transport simulation is combined with a nonlinear weighted least squares multiple-regression procedure. The technique provides optimal parameter estimates and gives statistics for assessing the reliability of these estimates under certain general assumptions about the distributions of the random measurement errors. Monte Carlo analysis is used to estimate parameter reliability for a hypothetical homogeneous soil column for which concentration data contain large random measurement errors. The value of data collected spatially versus data collected temporally was investigated for estimation of velocity, dispersion coefficient, effective porosity, first-order decay rate, and zero-order production. The use of spatial data gave estimates that were 2–3 times more reliable than estimates based on temporal data for all parameters except velocity. Comparison of estimated linear and nonlinear confidence intervals based upon Monte Carlo analysis showed that the linear approximation is poor for dispersion coefficient and zero-order production coefficient when data are collected over time. In addition, examples demonstrate transport parameter estimation for two real one-dimensional systems. First, the longitudinal dispersivity and effective porosity of an unsaturated soil are estimated using laboratory column data. We compare the reliability of estimates based upon data from individual laboratory experiments versus estimates based upon pooled data from several experiments. Second, the simulation nonlinear regression procedure is extended to include an additional governing equation that describes delayed storage during contaminant transport. The model is applied to analyze the trends, variability, and interrelationship of parameters in a mourtain stream in northern California.
Malignant testicular tumour incidence and mortality trends
Wojtyła-Buciora, Paulina; Więckowska, Barbara; Krzywinska-Wiewiorowska, Małgorzata; Gromadecka-Sutkiewicz, Małgorzata
2016-01-01
Aim of the study In Poland testicular tumours are the most frequent cancer among men aged 20–44 years. Testicular tumour incidence since the 1980s and 1990s has been diversified geographically, with an increased risk of mortality in Wielkopolska Province, which was highlighted at the turn of the 1980s and 1990s. The aim of the study was the comparative analysis of the tendencies in incidence and death rates due to malignant testicular tumours observed among men in Poland and in Wielkopolska Province. Material and methods Data from the National Cancer Registry were used for calculations. The incidence/mortality rates among men due to malignant testicular cancer as well as the tendencies in incidence/death ratio observed in Poland and Wielkopolska were established based on regression equation. The analysis was deepened by adopting the multiple linear regression model. A p-value < 0.05 was arbitrarily adopted as the criterion of statistical significance, and for multiple comparisons it was modified according to the Bonferroni adjustment to a value of p < 0.0028. Calculations were performed with the use of PQStat v1.4.8 package. Results The incidence of malignant testicular neoplasms observed among men in Poland and in Wielkopolska Province indicated a significant rising tendency. The multiple linear regression model confirmed that the year variable is a strong incidence forecast factor only within the territory of Poland. A corresponding analysis of mortality rates among men in Poland and in Wielkopolska Province did not show any statistically significant correlations. Conclusions Late diagnosis of Polish patients calls for undertaking appropriate educational activities that would facilitate earlier reporting of the patients, thus increasing their chances for recovery. Introducing preventive examinations in the regions of increased risk of testicular tumour may allow earlier diagnosis. PMID:27095941
Correlation of Vitamin D status and orthodontic-induced external apical root resorption
Tehranchi, Azita; Sadighnia, Azin; Younessian, Farnaz; Abdi, Amir H.; Shirvani, Armin
2017-01-01
Background: Adequate Vitamin D is essential for dental and skeletal health in children and adult. The purpose of this study was to assess the correlation of serum Vitamin D level with external-induced apical root resorption (EARR) following fixed orthodontic treatment. Materials and Methods: In this cross-sectional study, the prevalence of Vitamin D deficiency (defined by25-hydroxyvitamin-D) was determined in 34 patients (23.5% male; age range 12–23 years; mean age 16.63 ± 2.84) treated with fixed orthodontic treatment. Root resorption of four maxillary incisors was measured using before and after periapical radiographs (136 measured teeth) by means of a design-to-purpose software to optimize data collection. Teeth with a maximum percentage of root resorption (%EARR) were indicated as representative root resorption for each patient. A multiple linear regression model and Pearson correlation coefficient were used to assess the association of Vitamin D status and observed EARR. P < 0.05 was considered statistically significant. Results: The Pearson coefficient between these two variables was determined about 0.15 (P = 0.38). Regression analysis revealed that Vitamin D status of the patients demonstrated no significant statistical correlation with EARR, after adjustment of confounding variables using linear regression model (P > 0.05). Conclusion: This study suggests that Vitamin D level is not among the clinical variables that are potential contributors for EARR. The prevalence of Vitamin D deficiency does not differ in patients with higher EARR. These data suggest the possibility that Vitamin D insufficiency may not contribute to the development of more apical root resorption although this remains to be confirmed by further longitudinal cohort studies. PMID:29238379
A novel multi-target regression framework for time-series prediction of drug efficacy.
Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin
2017-01-18
Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task.
A novel multi-target regression framework for time-series prediction of drug efficacy
Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin
2017-01-01
Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task. PMID:28098186
Demidenko, Eugene
2017-09-01
The exact density distribution of the nonlinear least squares estimator in the one-parameter regression model is derived in closed form and expressed through the cumulative distribution function of the standard normal variable. Several proposals to generalize this result are discussed. The exact density is extended to the estimating equation (EE) approach and the nonlinear regression with an arbitrary number of linear parameters and one intrinsically nonlinear parameter. For a very special nonlinear regression model, the derived density coincides with the distribution of the ratio of two normally distributed random variables previously obtained by Fieller (1932), unlike other approximations previously suggested by other authors. Approximations to the density of the EE estimators are discussed in the multivariate case. Numerical complications associated with the nonlinear least squares are illustrated, such as nonexistence and/or multiple solutions, as major factors contributing to poor density approximation. The nonlinear Markov-Gauss theorem is formulated based on the near exact EE density approximation.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A
2013-05-01
This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.
Tenan, Matthew S; Tweedell, Andrew J; Haynes, Courtney A
2017-01-01
The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60-90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity.
Perry, Charles A.; Wolock, David M.; Artman, Joshua C.
2004-01-01
Streamflow statistics of flow duration and peak-discharge frequency were estimated for 4,771 individual locations on streams listed on the 1999 Kansas Surface Water Register. These statistics included the flow-duration values of 90, 75, 50, 25, and 10 percent, as well as the mean flow value. Peak-discharge frequency values were estimated for the 2-, 5-, 10-, 25-, 50-, and 100-year floods. Least-squares multiple regression techniques were used, along with Tobit analyses, to develop equations for estimating flow-duration values of 90, 75, 50, 25, and 10 percent and the mean flow for uncontrolled flow stream locations. The contributing-drainage areas of 149 U.S. Geological Survey streamflow-gaging stations in Kansas and parts of surrounding States that had flow uncontrolled by Federal reservoirs and used in the regression analyses ranged from 2.06 to 12,004 square miles. Logarithmic transformations of climatic and basin data were performed to yield the best linear relation for developing equations to compute flow durations and mean flow. In the regression analyses, the significant climatic and basin characteristics, in order of importance, were contributing-drainage area, mean annual precipitation, mean basin permeability, and mean basin slope. The analyses yielded a model standard error of prediction range of 0.43 logarithmic units for the 90-percent duration analysis to 0.15 logarithmic units for the 10-percent duration analysis. The model standard error of prediction was 0.14 logarithmic units for the mean flow. Regression equations used to estimate peak-discharge frequency values were obtained from a previous report, and estimates for the 2-, 5-, 10-, 25-, 50-, and 100-year floods were determined for this report. The regression equations and an interpolation procedure were used to compute flow durations, mean flow, and estimates of peak-discharge frequency for locations along uncontrolled flow streams on the 1999 Kansas Surface Water Register. Flow durations, mean flow, and peak-discharge frequency values determined at available gaging stations were used to interpolate the regression-estimated flows for the stream locations where available. Streamflow statistics for locations that had uncontrolled flow were interpolated using data from gaging stations weighted according to the drainage area and the bias between the regression-estimated and gaged flow information. On controlled reaches of Kansas streams, the streamflow statistics were interpolated between gaging stations using only gaged data weighted by drainage area.
Statistical modelling of networked human-automation performance using working memory capacity.
Ahmed, Nisar; de Visser, Ewart; Shaw, Tyler; Mohamed-Ameen, Amira; Campbell, Mark; Parasuraman, Raja
2014-01-01
This study examines the challenging problem of modelling the interaction between individual attentional limitations and decision-making performance in networked human-automation system tasks. Analysis of real experimental data from a task involving networked supervision of multiple unmanned aerial vehicles by human participants shows that both task load and network message quality affect performance, but that these effects are modulated by individual differences in working memory (WM) capacity. These insights were used to assess three statistical approaches for modelling and making predictions with real experimental networked supervisory performance data: classical linear regression, non-parametric Gaussian processes and probabilistic Bayesian networks. It is shown that each of these approaches can help designers of networked human-automated systems cope with various uncertainties in order to accommodate future users by linking expected operating conditions and performance from real experimental data to observable cognitive traits like WM capacity. Practitioner Summary: Working memory (WM) capacity helps account for inter-individual variability in operator performance in networked unmanned aerial vehicle supervisory tasks. This is useful for reliable performance prediction near experimental conditions via linear models; robust statistical prediction beyond experimental conditions via Gaussian process models and probabilistic inference about unknown task conditions/WM capacities via Bayesian network models.
Experimental and environmental factors affect spurious detection of ecological thresholds
Daily, Jonathan P.; Hitt, Nathaniel P.; Smith, David; Snyder, Craig D.
2012-01-01
Threshold detection methods are increasingly popular for assessing nonlinear responses to environmental change, but their statistical performance remains poorly understood. We simulated linear change in stream benthic macroinvertebrate communities and evaluated the performance of commonly used threshold detection methods based on model fitting (piecewise quantile regression [PQR]), data partitioning (nonparametric change point analysis [NCPA]), and a hybrid approach (significant zero crossings [SiZer]). We demonstrated that false detection of ecological thresholds (type I errors) and inferences on threshold locations are influenced by sample size, rate of linear change, and frequency of observations across the environmental gradient (i.e., sample-environment distribution, SED). However, the relative importance of these factors varied among statistical methods and between inference types. False detection rates were influenced primarily by user-selected parameters for PQR (τ) and SiZer (bandwidth) and secondarily by sample size (for PQR) and SED (for SiZer). In contrast, the location of reported thresholds was influenced primarily by SED. Bootstrapped confidence intervals for NCPA threshold locations revealed strong correspondence to SED. We conclude that the choice of statistical methods for threshold detection should be matched to experimental and environmental constraints to minimize false detection rates and avoid spurious inferences regarding threshold location.
Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume
2014-06-28
Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.
Statistical tools for transgene copy number estimation based on real-time PCR.
Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal
2007-11-01
As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.
Meta-regression approximations to reduce publication selection bias.
Stanley, T D; Doucouliagos, Hristos
2014-03-01
Publication selection bias is a serious challenge to the integrity of all empirical sciences. We derive meta-regression approximations to reduce this bias. Our approach employs Taylor polynomial approximations to the conditional mean of a truncated distribution. A quadratic approximation without a linear term, precision-effect estimate with standard error (PEESE), is shown to have the smallest bias and mean squared error in most cases and to outperform conventional meta-analysis estimators, often by a great deal. Monte Carlo simulations also demonstrate how a new hybrid estimator that conditionally combines PEESE and the Egger regression intercept can provide a practical solution to publication selection bias. PEESE is easily expanded to accommodate systematic heterogeneity along with complex and differential publication selection bias that is related to moderator variables. By providing an intuitive reason for these approximations, we can also explain why the Egger regression works so well and when it does not. These meta-regression methods are applied to several policy-relevant areas of research including antidepressant effectiveness, the value of a statistical life, the minimum wage, and nicotine replacement therapy. Copyright © 2013 John Wiley & Sons, Ltd.
A Regression Framework for Effect Size Assessments in Longitudinal Modeling of Group Differences
Feingold, Alan
2013-01-01
The use of growth modeling analysis (GMA)--particularly multilevel analysis and latent growth modeling--to test the significance of intervention effects has increased exponentially in prevention science, clinical psychology, and psychiatry over the past 15 years. Model-based effect sizes for differences in means between two independent groups in GMA can be expressed in the same metric (Cohen’s d) commonly used in classical analysis and meta-analysis. This article first reviews conceptual issues regarding calculation of d for findings from GMA and then introduces an integrative framework for effect size assessments that subsumes GMA. The new approach uses the structure of the linear regression model, from which effect sizes for findings from diverse cross-sectional and longitudinal analyses can be calculated with familiar statistics, such as the regression coefficient, the standard deviation of the dependent measure, and study duration. PMID:23956615
Father and adolescent son variables related to son's HIV prevention.
Glenn, Betty L; Demi, Alice; Kimble, Laura P
2008-02-01
The purpose of this study was to examine the relationship between fathers' influences and African American male adolescents' perceptions of self-efficacy to reduce high-risk sexual behavior. A convenience sample of 70 fathers was recruited from churches in a large metropolitan area in the South. Hierarchical multiple linear regression analysis indicated father-related factors and son-related factors were associated with 26.1% of the variance in son's self-efficacy to be abstinent. In the regression model greater son's perception of the communication of sexual standards and greater father's perception of his son's self-efficacy were significantly related to greater son's self-efficacy for abstinence. The second regression model with son's self-efficacy for safer sex as the criterion was not statistically significant. Data support the need for fathers to express confidence in their sons' ability to be abstinent or practice safer sex and to communicate with their sons regarding sexual issues and standards.
An introduction to using Bayesian linear regression with clinical data.
Baldwin, Scott A; Larson, Michael J
2017-11-01
Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.
Jacob, Benjamin J; Krapp, Fiorella; Ponce, Mario; Gottuzzo, Eduardo; Griffith, Daniel A; Novak, Robert J
2010-05-01
Spatial autocorrelation is problematic for classical hierarchical cluster detection tests commonly used in multi-drug resistant tuberculosis (MDR-TB) analyses as considerable random error can occur. Therefore, when MDRTB clusters are spatially autocorrelated the assumption that the clusters are independently random is invalid. In this research, a product moment correlation coefficient (i.e., the Moran's coefficient) was used to quantify local spatial variation in multiple clinical and environmental predictor variables sampled in San Juan de Lurigancho, Lima, Peru. Initially, QuickBird 0.61 m data, encompassing visible bands and the near infra-red bands, were selected to synthesize images of land cover attributes of the study site. Data of residential addresses of individual patients with smear-positive MDR-TB were geocoded, prevalence rates calculated and then digitally overlaid onto the satellite data within a 2 km buffer of 31 georeferenced health centers, using a 10 m2 grid-based algorithm. Geographical information system (GIS)-gridded measurements of each health center were generated based on preliminary base maps of the georeferenced data aggregated to block groups and census tracts within each buffered area. A three-dimensional model of the study site was constructed based on a digital elevation model (DEM) to determine terrain covariates associated with the sampled MDR-TB covariates. Pearson's correlation was used to evaluate the linear relationship between the DEM and the sampled MDR-TB data. A SAS/GIS(R) module was then used to calculate univariate statistics and to perform linear and non-linear regression analyses using the sampled predictor variables. The estimates generated from a global autocorrelation analyses were then spatially decomposed into empirical orthogonal bases using a negative binomial regression with a non-homogeneous mean. Results of the DEM analyses indicated a statistically non-significant, linear relationship between georeferenced health centers and the sampled covariate elevation. The data exhibited positive spatial autocorrelation and the decomposition of Moran's coefficient into uncorrelated, orthogonal map pattern components revealed global spatial heterogeneities necessary to capture latent autocorrelation in the MDR-TB model. It was thus shown that Poisson regression analyses and spatial eigenvector mapping can elucidate the mechanics of MDR-TB transmission by prioritizing clinical and environmental-sampled predictor variables for identifying high risk populations.
1974-01-01
REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
1993-03-01
statistical mathe- matics, began in the late 1800’s when Sir Francis Galton first attempted to use practical mathematical techniques to investigate the...randomly collected (sampled) many pairs of parent/child height mea- surements (data), Galton observed that for a given parent- height average, the...ty only Maximum Adjusted R2 will be discussed. However, Maximum Adjusted R’ and Minimum MSE test exactly the same 2.thing. Adjusted R is related to R
On Deriving and Solving the Generalized Bivariate, Linear Location Problems.
1982-09-01
average (Eisenhart, 1978). Francis Galton indirectly coined the term "regression" in his 1885 publication, Natural Inheritance, when he studied sweet...David, F. N. Francis Galton . In W. H. Kruskal & J. Tanur (Eds.), International encyclopedia of statistics (Vol. 1). New York: Free Press, 1978. Dean, W...mhhhEmhnhhEEEI I fllfllfllfllfllfllfl EEEMMhMhMhhhMhI 1111 . I 28 12.5 1.:, 1 2 . 1.21111 1 4 11111I. IIIII~ JIII1L MICROCOPY RESOLUTION TEST CHART NATIONAL
Element enrichment factor calculation using grain-size distribution and functional data regression.
Sierra, C; Ordóñez, C; Saavedra, A; Gallego, J R
2015-01-01
In environmental geochemistry studies it is common practice to normalize element concentrations in order to remove the effect of grain size. Linear regression with respect to a particular grain size or conservative element is a widely used method of normalization. In this paper, the utility of functional linear regression, in which the grain-size curve is the independent variable and the concentration of pollutant the dependent variable, is analyzed and applied to detrital sediment. After implementing functional linear regression and classical linear regression models to normalize and calculate enrichment factors, we concluded that the former regression technique has some advantages over the latter. First, functional linear regression directly considers the grain-size distribution of the samples as the explanatory variable. Second, as the regression coefficients are not constant values but functions depending on the grain size, it is easier to comprehend the relationship between grain size and pollutant concentration. Third, regularization can be introduced into the model in order to establish equilibrium between reliability of the data and smoothness of the solutions. Copyright © 2014 Elsevier Ltd. All rights reserved.
A rational model of function learning.
Lucas, Christopher G; Griffiths, Thomas L; Williams, Joseph J; Kalish, Michael L
2015-10-01
Theories of how people learn relationships between continuous variables have tended to focus on two possibilities: one, that people are estimating explicit functions, or two that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, which provide a probabilistic basis for similarity-based function learning, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a rational model of human function learning that combines the strengths of both approaches and accounts for a wide variety of experimental results.
NASA Technical Reports Server (NTRS)
Clark, P. E.; Andre, C. G.; Adler, I.; Weidner, J.; Podwysocki, M.
1976-01-01
The positive correlation between Al/Si X-ray fluorescence intensity ratios determined during the Apollo 15 lunar mission and a broad-spectrum visible albedo of the moon is quantitatively established. Linear regression analysis performed on 246 1 degree geographic cells of X-ray fluorescence intensity and visible albedo data points produced a statistically significant correlation coefficient of .78. Three distinct distributions of data were identified as (1) within one standard deviation of the regression line, (2) greater than one standard deviation below the line, and (3) greater than one standard deviation above the line. The latter two distributions of data were found to occupy distinct geographic areas in the Palus Somni region.
Who Will Win?: Predicting the Presidential Election Using Linear Regression
ERIC Educational Resources Information Center
Lamb, John H.
2007-01-01
This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
An Interactive Tool For Semi-automated Statistical Prediction Using Earth Observations and Models
NASA Astrophysics Data System (ADS)
Zaitchik, B. F.; Berhane, F.; Tadesse, T.
2015-12-01
We developed a semi-automated statistical prediction tool applicable to concurrent analysis or seasonal prediction of any time series variable in any geographic location. The tool was developed using Shiny, JavaScript, HTML and CSS. A user can extract a predictand by drawing a polygon over a region of interest on the provided user interface (global map). The user can select the Climatic Research Unit (CRU) precipitation or Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) as predictand. They can also upload their own predictand time series. Predictors can be extracted from sea surface temperature, sea level pressure, winds at different pressure levels, air temperature at various pressure levels, and geopotential height at different pressure levels. By default, reanalysis fields are applied as predictors, but the user can also upload their own predictors, including a wide range of compatible satellite-derived datasets. The package generates correlations of the variables selected with the predictand. The user also has the option to generate composites of the variables based on the predictand. Next, the user can extract predictors by drawing polygons over the regions that show strong correlations (composites). Then, the user can select some or all of the statistical prediction models provided. Provided models include Linear Regression models (GLM, SGLM), Tree-based models (bagging, random forest, boosting), Artificial Neural Network, and other non-linear models such as Generalized Additive Model (GAM) and Multivariate Adaptive Regression Splines (MARS). Finally, the user can download the analysis steps they used, such as the region they selected, the time period they specified, the predictand and predictors they chose and preprocessing options they used, and the model results in PDF or HTML format. Key words: Semi-automated prediction, Shiny, R, GLM, ANN, RF, GAM, MARS
Reflexion on linear regression trip production modelling method for ensuring good model quality
NASA Astrophysics Data System (ADS)
Suprayitno, Hitapriya; Ratnasari, Vita
2017-11-01
Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.
VoxelStats: A MATLAB Package for Multi-Modal Voxel-Wise Brain Image Analysis.
Mathotaarachchi, Sulantha; Wang, Seqian; Shin, Monica; Pascoal, Tharick A; Benedet, Andrea L; Kang, Min Su; Beaudry, Thomas; Fonov, Vladimir S; Gauthier, Serge; Labbe, Aurélie; Rosa-Neto, Pedro
2016-01-01
In healthy individuals, behavioral outcomes are highly associated with the variability on brain regional structure or neurochemical phenotypes. Similarly, in the context of neurodegenerative conditions, neuroimaging reveals that cognitive decline is linked to the magnitude of atrophy, neurochemical declines, or concentrations of abnormal protein aggregates across brain regions. However, modeling the effects of multiple regional abnormalities as determinants of cognitive decline at the voxel level remains largely unexplored by multimodal imaging research, given the high computational cost of estimating regression models for every single voxel from various imaging modalities. VoxelStats is a voxel-wise computational framework to overcome these computational limitations and to perform statistical operations on multiple scalar variables and imaging modalities at the voxel level. VoxelStats package has been developed in Matlab(®) and supports imaging formats such as Nifti-1, ANALYZE, and MINC v2. Prebuilt functions in VoxelStats enable the user to perform voxel-wise general and generalized linear models and mixed effect models with multiple volumetric covariates. Importantly, VoxelStats can recognize scalar values or image volumes as response variables and can accommodate volumetric statistical covariates as well as their interaction effects with other variables. Furthermore, this package includes built-in functionality to perform voxel-wise receiver operating characteristic analysis and paired and unpaired group contrast analysis. Validation of VoxelStats was conducted by comparing the linear regression functionality with existing toolboxes such as glim_image and RMINC. The validation results were identical to existing methods and the additional functionality was demonstrated by generating feature case assessments (t-statistics, odds ratio, and true positive rate maps). In summary, VoxelStats expands the current methods for multimodal imaging analysis by allowing the estimation of advanced regional association metrics at the voxel level.
Fisher, Charles K; Mehta, Pankaj
2015-06-01
Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Jastreboff, P W
1979-06-01
Time histograms of neural responses evoked by sinuosidal stimulation often contain a slow drifting and an irregular noise which disturb Fourier analysis of these responses. Section 2 of this paper evaluates the extent to which a linear drift influences the Fourier analysis, and develops a combined Fourier and linear regression analysis for detecting and correcting for such a linear drift. Usefulness of this correcting method is demonstrated for the time histograms of actual eye movements and Purkinje cell discharges evoked by sinusoidal rotation of rabbits in the horizontal plane. In Sect. 3, the analysis of variance is adopted for estimating the probability of the random occurrence of the response curve extracted by Fourier analysis from noise. This method proved to be useful for avoiding false judgements as to whether the response curve was meaningful, particularly when the response was small relative to the contaminating noise.
Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.
2008-01-01
Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.
Functional linear models to test for differences in prairie wetland hydraulic gradients
Greenwood, Mark C.; Sojda, Richard S.; Preston, Todd M.; Swayne, David A.; Yang, Wanhong; Voinov, A.A.; Rizzoli, A.; Filatova, T.
2010-01-01
Functional data analysis provides a framework for analyzing multiple time series measured frequently in time, treating each series as a continuous function of time. Functional linear models are used to test for effects on hydraulic gradient functional responses collected from three types of land use in Northeastern Montana at fourteen locations. Penalized regression-splines are used to estimate the underlying continuous functions based on the discretely recorded (over time) gradient measurements. Permutation methods are used to assess the statistical significance of effects. A method for accommodating missing observations in each time series is described. Hydraulic gradients may be an initial and fundamental ecosystem process that responds to climate change. We suggest other potential uses of these methods for detecting evidence of climate change.
Proton radius from electron scattering data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Higinbotham, Douglas W.; Kabir, Al Amin; Lin, Vincent
Background: The proton charge radius extracted from recent muonic hydrogen Lamb shift measurements is significantly smaller than that extracted from atomic hydrogen and electron scattering measurements. The discrepancy has become known as the proton radius puzzle. Purpose: In an attempt to understand the discrepancy, we review high-precision electron scattering results from Mainz, Jefferson Lab, Saskatoon and Stanford. Methods: We make use of stepwise regression techniques using the F-test as well as the Akaike information criterion to systematically determine the predictive variables to use for a given set and range of electron scattering data as well as to provide multivariate errormore » estimates. Results: Starting with the precision, low four-momentum transfer (Q 2) data from Mainz (1980) and Saskatoon (1974), we find that a stepwise regression of the Maclaurin series using the F-test as well as the Akaike information criterion justify using a linear extrapolation which yields a value for the proton radius that is consistent with the result obtained from muonic hydrogen measurements. Applying the same Maclaurin series and statistical criteria to the 2014 Rosenbluth results on GE from Mainz, we again find that the stepwise regression tends to favor a radius consistent with the muonic hydrogen radius but produces results that are extremely sensitive to the range of data included in the fit. Making use of the high-Q 2 data on G E to select functions which extrapolate to high Q 2, we find that a Pad´e (N = M = 1) statistical model works remarkably well, as does a dipole function with a 0.84 fm radius, G E(Q 2) = (1 + Q 2/0.66 GeV 2) -2. Conclusions: Rigorous applications of stepwise regression techniques and multivariate error estimates result in the extraction of a proton charge radius that is consistent with the muonic hydrogen result of 0.84 fm; either from linear extrapolation of the extreme low-Q 2 data or by use of the Pad´e approximant for extrapolation using a larger range of data. Thus, based on a purely statistical analysis of electron scattering data, we conclude that the electron scattering result and the muonic hydrogen result are consistent. Lastly, it is the atomic hydrogen results that are the outliers.« less
Proton radius from electron scattering data
Higinbotham, Douglas W.; Kabir, Al Amin; Lin, Vincent; ...
2016-05-31
Background: The proton charge radius extracted from recent muonic hydrogen Lamb shift measurements is significantly smaller than that extracted from atomic hydrogen and electron scattering measurements. The discrepancy has become known as the proton radius puzzle. Purpose: In an attempt to understand the discrepancy, we review high-precision electron scattering results from Mainz, Jefferson Lab, Saskatoon and Stanford. Methods: We make use of stepwise regression techniques using the F-test as well as the Akaike information criterion to systematically determine the predictive variables to use for a given set and range of electron scattering data as well as to provide multivariate errormore » estimates. Results: Starting with the precision, low four-momentum transfer (Q 2) data from Mainz (1980) and Saskatoon (1974), we find that a stepwise regression of the Maclaurin series using the F-test as well as the Akaike information criterion justify using a linear extrapolation which yields a value for the proton radius that is consistent with the result obtained from muonic hydrogen measurements. Applying the same Maclaurin series and statistical criteria to the 2014 Rosenbluth results on GE from Mainz, we again find that the stepwise regression tends to favor a radius consistent with the muonic hydrogen radius but produces results that are extremely sensitive to the range of data included in the fit. Making use of the high-Q 2 data on G E to select functions which extrapolate to high Q 2, we find that a Pad´e (N = M = 1) statistical model works remarkably well, as does a dipole function with a 0.84 fm radius, G E(Q 2) = (1 + Q 2/0.66 GeV 2) -2. Conclusions: Rigorous applications of stepwise regression techniques and multivariate error estimates result in the extraction of a proton charge radius that is consistent with the muonic hydrogen result of 0.84 fm; either from linear extrapolation of the extreme low-Q 2 data or by use of the Pad´e approximant for extrapolation using a larger range of data. Thus, based on a purely statistical analysis of electron scattering data, we conclude that the electron scattering result and the muonic hydrogen result are consistent. Lastly, it is the atomic hydrogen results that are the outliers.« less
Wei, R; Claes, P; Walters, M; Wholley, C; Clement, J G
2011-06-01
The facial region has traditionally been quantified using linear anthropometrics. These are well established in dentistry, but require expertise to be used effectively. The aim of this study was to augment the utility of linear anthropometrics by applying them in conjunction with modern 3-D morphometrics. Facial images of 75 males and 94 females aged 18-25 years with self-reported Caucasian ancestry were used. An anthropometric mask was applied to establish corresponding quasi-landmarks on the images in the dataset. A statistical face-space, encoding shape covariation, was established. The facial median plane was extracted facilitating both manual and automated indication of commonly used midline landmarks. From both indications, facial convexity angles were calculated and compared. The angles were related to the face-space using a regression based pathway enabling the visualization of facial form associated with convexity variation. Good agreement between the manual and automated angles was found (Pearson correlation: 0.9478-0.9474, Dahlberg root mean squared error: 1.15°-1.24°). The population mean angle was 166.59°-166.29° (SD 5.09°-5.2°) for males-females. The angle-pathway provided valuable feedback. Linear facial anthropometrics can be extended when used in combination with a face-space derived from 3-D scans and the exploration of property pathways inferred in a statistically verifiable way. © 2011 Australian Dental Association.
Association between blood cholesterol and sodium intake in hypertensive women with excess weight.
Padilha, Bruna Merten; Ferreira, Raphaela Costa; Bueno, Nassib Bezerra; Tassitano, Rafael Miranda; Holanda, Lidiana de Souza; Vasconcelos, Sandra Mary Lima; Cabral, Poliana Coelho
2018-04-01
Restricted sodium intake has been recommended for more than 1 century for the treatment of hypertension. However, restriction seems to increase blood cholesterol. In women with excess weight, blood cholesterol may increase even more because of insulin resistance and the high lipolytic activity of adipose tissue.The aim of this study was to assess the association between blood cholesterol and sodium intake in hypertensive women with and without excess weight.This was a cross-sectional study with hypertensive and nondiabetic women aged 20 to 59 years, recruited at the primary healthcare units of Maceio, Alagoas, Brazilian Northeast. Excess weight was defined as body mass index (BMI) ≥25.0 kg/m. Sodium intake was estimated by the 24-hour urinary excretion of sodium. Blood cholesterol was the primary outcome investigated by this study, and its relationship with sodium intake and other variables was assessed by Pearson correlation and multivariate linear regression using a significance level of 5%.This study included 165 hypertensive women. Of these, 135 (81.8%) were with excess weight. The mean sodium intake was 3.7 g (±1.9) and 3.4 g (±2.4) in hypertensive women with and without excess weight, respectively. The multiple normal linear regression models fitted to the "blood cholesterol" in the 2 groups reveal that for the group of hypertensive women without excess weight only 1 independent variable "age" is statistically significant to explain the variability of the blood cholesterol levels. However, for the group of hypertensive women with excess weight, 2 independent variables, age and sodium intake, can statistically explain variations of the blood cholesterol levels.Blood cholesterol is statistically inversely related to sodium intake for hypertensive women with excess weight, but it is not statistically related to sodium intake for hypertensive women without excess weight.
Hershman, Dawn L; Unger, Joseph M; Crew, Katherine D; Till, Cathee; Greenlee, Heather; Minasian, Lori M; Moinpour, Carol M; Lew, Danika L; Fehrenbacher, Louis; Wade, James L; Wong, Siu-Fun; Fisch, Michael J; Lynn Henry, N; Albain, Kathy S
2018-06-01
Chemotherapy-induced peripheral neuropathy (CIPN) is a common and disabling side effect of taxanes. Acetyl-L-carnitine (ALC) was unexpectedly found to increase CIPN in a randomized trial. We investigated the long-term patterns of CIPN among patients in this trial. S0715 was a randomized, double-blind, multicenter trial comparing ALC (1000 mg three times a day) with placebo for 24 weeks in women undergoing adjuvant taxane-based chemotherapy for breast cancer. CIPN was measured by the 11-item neurotoxicity (NTX) component of the FACT-Taxane scale at weeks 12, 24, 36, 52, and 104. We examined NTX scores over two years using linear mixed models for longitudinal data. Individual time points were examined using linear regression. Regression analyses included stratification factors and the baseline score as covariates. All statistical tests were two-sided. Four-hundred nine subjects were eligible for evaluation. Patients receiving ALC had a statistically significantly (P = .01) greater reduction in NTX scores (worse CIPN) of -1.39 points (95% confidence interval [CI] = -2.48 to -0.30) than the placebo group. These differences were particularly evident at weeks 24 (-1.68, 95% CI = -3.02 to -0.33), 36 (-1.37, 95% CI = -2.69 to -0.04), and 52 (-1.83, 95% CI = -3.35 to -0.32). At 104 weeks, 39.5% on the ALC arm and 34.4% on the placebo arm reported a five-point (10%) decrease from baseline. For both treatment groups, 104-week NTX scores were statistically significantly different compared with baseline (P < .001). For both groups, NTX scores were reduced from baseline and remained persistently low. Twenty-four weeks of ALC therapy resulted in statistically significantly worse CIPN over two years. Understanding the mechanism of this persistent effect may inform prevention and treatment strategies. Until then, the potential efficacy and harms of commonly used supplements should be rigorously studied.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Daily commuting to work is not associated with variables of health.
Mauss, Daniel; Jarczok, Marc N; Fischer, Joachim E
2016-01-01
Commuting to work is thought to have a negative impact on employee health. We tested the association of work commute and different variables of health in German industrial employees. Self-rated variables of an industrial cohort (n = 3805; 78.9 % male) including absenteeism, presenteeism and indices reflecting stress and well-being were assessed by a questionnaire. Fasting blood samples, heart-rate variability and anthropometric data were collected. Commuting was grouped into one of four categories: 0-19.9, 20-44.9, 45-59.9, ≥60 min travelling one way to work. Bivariate associations between commuting and all variables under study were calculated. Linear regression models tested this association further, controlling for potential confounders. Commuting was positively correlated with waist circumference and inversely with triglycerides. These associations did not remain statistically significant in linear regression models controlling for age, gender, marital status, and shiftwork. No other association with variables of physical, psychological, or mental health and well-being could be found. The results indicate that commuting to work has no significant impact on well-being and health of German industrial employees.
Quantifying and Reducing Curve-Fitting Uncertainty in Isc
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campanelli, Mark; Duck, Benjamin; Emery, Keith
2015-06-14
Current-voltage (I-V) curve measurements of photovoltaic (PV) devices are used to determine performance parameters and to establish traceable calibration chains. Measurement standards specify localized curve fitting methods, e.g., straight-line interpolation/extrapolation of the I-V curve points near short-circuit current, Isc. By considering such fits as statistical linear regressions, uncertainties in the performance parameters are readily quantified. However, the legitimacy of such a computed uncertainty requires that the model be a valid (local) representation of the I-V curve and that the noise be sufficiently well characterized. Using more data points often has the advantage of lowering the uncertainty. However, more data pointsmore » can make the uncertainty in the fit arbitrarily small, and this fit uncertainty misses the dominant residual uncertainty due to so-called model discrepancy. Using objective Bayesian linear regression for straight-line fits for Isc, we investigate an evidence-based method to automatically choose data windows of I-V points with reduced model discrepancy. We also investigate noise effects. Uncertainties, aligned with the Guide to the Expression of Uncertainty in Measurement (GUM), are quantified throughout.« less
NASA Astrophysics Data System (ADS)
Donroman, T.; Chesoh, S.; Lim, A.
2018-04-01
This study aimed to investigate the variation patterns of fish fingerling abundance based on month, year and sampling site. Monthly collecting data set of the Na Thap tidal river of southern Thailand, were obtained from June 2005 to October 2015. The square root transformation was employed for maintaining the fingerling data normality. Factor analysis was applied for clustering number of fingerling species and multiple linear regression was used to examine the association between fingerling density and year, month and site. Results from factor analysis classified fingerling into 3 factors based on saline preference; saline water, freshwater and ubiquitous species. The results showed a statistically high significant relation between fingerling density, month, year and site. Abundance of saline water and ubiquitous fingerling density showed similar pattern. Downstream site presented highest fingerling density whereas almost of freshwater fingerling occurred in upstream. This finding confirmed that factor analysis and the general linear regression method can be used as an effective tool for predicting and monitoring wild fingerling density in order to sustain fish stock management.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666
Quantifying and Reducing Curve-Fitting Uncertainty in Isc: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campanelli, Mark; Duck, Benjamin; Emery, Keith
Current-voltage (I-V) curve measurements of photovoltaic (PV) devices are used to determine performance parameters and to establish traceable calibration chains. Measurement standards specify localized curve fitting methods, e.g., straight-line interpolation/extrapolation of the I-V curve points near short-circuit current, Isc. By considering such fits as statistical linear regressions, uncertainties in the performance parameters are readily quantified. However, the legitimacy of such a computed uncertainty requires that the model be a valid (local) representation of the I-V curve and that the noise be sufficiently well characterized. Using more data points often has the advantage of lowering the uncertainty. However, more data pointsmore » can make the uncertainty in the fit arbitrarily small, and this fit uncertainty misses the dominant residual uncertainty due to so-called model discrepancy. Using objective Bayesian linear regression for straight-line fits for Isc, we investigate an evidence-based method to automatically choose data windows of I-V points with reduced model discrepancy. We also investigate noise effects. Uncertainties, aligned with the Guide to the Expression of Uncertainty in Measurement (GUM), are quantified throughout.« less
Welp, Gerhard; Thiel, Michael
2017-01-01
Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties–sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen–in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models–multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)–were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness, coloration and saturation were prominent predictors in digital soil mapping. Considering the increased availability of freely available Remote Sensing data (e.g. Landsat, SRTM, Sentinels), soil information at local and regional scales in data poor regions such as West Africa can be improved with relatively little financial and human resources. PMID:28114334
Preliminary Survey on TRY Forest Traits and Growth Index Relations - New Challenges
NASA Astrophysics Data System (ADS)
Lyubenova, Mariyana; Kattge, Jens; van Bodegom, Peter; Chikalanov, Alexandre; Popova, Silvia; Zlateva, Plamena; Peteva, Simona
2016-04-01
Forest ecosystems provide critical ecosystem goods and services, including food, fodder, water, shelter, nutrient cycling, and cultural and recreational value. Forests also store carbon, provide habitat for a wide range of species and help alleviate land degradation and desertification. Thus they have a potentially significant role to play in climate change adaptation planning through maintaining ecosystem services and providing livelihood options. Therefore the study of forest traits is such an important issue not just for individual countries but for the planet as a whole. We need to know what functional relations between forest traits exactly can express TRY data base and haw it will be significant for the global modeling and IPBES. The study of the biodiversity characteristics at all levels and functional links between them is extremely important for the selection of key indicators for assessing biodiversity and ecosystem services for sustainable natural capital control. By comparing the available information in tree data bases: TRY, ITR (International Tree Ring) and SP-PAM the 42 tree species are selected for the traits analyses. The dependence between location characteristics (latitude, longitude, altitude, annual precipitation, annual temperature and soil type) and forest traits (specific leaf area, leaf weight ratio, wood density and growth index) is studied by by multiply regression analyses (RDA) using the statistical software package Canoco 4.5. The Pearson correlation coefficient (measure of linear correlation), Kendal rank correlation coefficient (non parametric measure of statistical dependence) and Spearman correlation coefficient (monotonic function relationship between two variables) are calculated for each pair of variables (indexes) and species. After analysis of above mentioned correlation coefficients the dimensional linear regression models, multidimensional linear and nonlinear regression models and multidimensional neural networks models are built. The strongest dependence between It and WD was obtained. The research will support the work on: Strategic Plan for Biodiversity 2011-2020, modelling and implementation of ecosystem-based approaches to climate change adaptation and disaster risk reduction. Key words: Specific leaf area (SLA), Leaf weight ratio (LWR), Wood density (WD), Growth index (It)
[Visual field progression in glaucoma: cluster analysis].
Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M
2012-11-01
Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Forkuor, Gerald; Hounkpatin, Ozias K L; Welp, Gerhard; Thiel, Michael
2017-01-01
Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness, coloration and saturation were prominent predictors in digital soil mapping. Considering the increased availability of freely available Remote Sensing data (e.g. Landsat, SRTM, Sentinels), soil information at local and regional scales in data poor regions such as West Africa can be improved with relatively little financial and human resources.
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Thomas, Christoph; Brodoefel, Harald; Tsiflikas, Ilias; Bruckner, Friederike; Reimann, Anja; Ketelsen, Dominik; Drosch, Tanja; Claussen, Claus D; Kopp, Andreas; Heuschmid, Martin; Burgstahler, Christof
2010-02-01
To prospectively evaluate the influence of the clinical pretest probability assessed by the Morise score onto image quality and diagnostic accuracy in coronary dual-source computed tomography angiography (DSCTA). In 61 patients, DSCTA and invasive coronary angiography were performed. Subjective image quality and accuracy for stenosis detection (>50%) of DSCTA with invasive coronary angiography as gold standard were evaluated. The influence of pretest probability onto image quality and accuracy was assessed by logistic regression and chi-square testing. Correlations of image quality and accuracy with the Morise score were determined using linear regression. Thirty-eight patients were categorized into the high, 21 into the intermediate, and 2 into the low probability group. Accuracies for the detection of significant stenoses were 0.94, 0.97, and 1.00, respectively. Logistic regressions and chi-square tests showed statistically significant correlations between Morise score and image quality (P < .0001 and P < .001) and accuracy (P = .0049 and P = .027). Linear regression revealed a cutoff Morise score for a good image quality of 16 and a cutoff for a barely diagnostic image quality beyond the upper Morise scale. Pretest probability is a weak predictor of image quality and diagnostic accuracy in coronary DSCTA. A sufficient image quality for diagnostic images can be reached with all pretest probabilities. Therefore, coronary DSCTA might be suitable also for patients with a high pretest probability. Copyright 2010 AUR. Published by Elsevier Inc. All rights reserved.
Changes in aerobic power of women, ages 20-64 yr
NASA Technical Reports Server (NTRS)
Jackson, A. S.; Wier, L. T.; Ayers, G. W.; Beard, E. F.; Stuteville, J. E.; Blair, S. N.
1996-01-01
This study quantified and compared the cross-sectional and longitudinal influence of age, self-report physical activity (SR-PA), and body composition (%fat) on the decline of maximal aerobic power (VO2peak) of women. The cross-sectional sample consisted of 409 healthy women, ages 20-64 yr. The 43 women of the longitudinal sample were from the same population and examined twice, the mean time between tests was 3.7 (+/-2.2) yr. Peak oxygen uptake was determined by indirect calorimetry during a maximal treadmill test. The zero-order correlation of -0.742 between VO2peak and %fat was significantly (P < 0.05) higher then the SR-PA (r = 0.626) and age correlations (r = -0.633). Linear regression defined the cross-sectional age-related decline in VO2peak at 0.537 ml.kg-1.min-1.yr-1. Multiple regression analysis (R = 0.851) showed that adding %fat and SR-PA and their interaction to the regression model reduced the age regression weight of -0.537, to -0.265 ml.kg-1.min-1.yr-1. Statistically controlling for time differences between tests, general linear models analysis showed that longitudinal changes in aerobic power were due to independent changes in %fat and SR-PA, confirming the cross-sectional results. These findings are consistent with men's data from the same lab showing that about 50% of the cross-sectional age-related decline in VO2peak was due to %fat and SR-PA.
A reliable and cost effective approach for radiographic monitoring in nutritional rickets.
Chatterjee, D; Gupta, V; Sharma, V; Sinha, B; Samanta, S
2014-04-01
Radiological scoring is particularly useful in rickets, where pre-treatment radiographical findings can reflect the disease severity and can be used to monitor the improvement. However, there is only a single radiographic scoring system for rickets developed by Thacher and, to the best of our knowledge, no study has evaluated radiographic changes in rickets based on this scoring system apart from the one done by Thacher himself. The main objective of this study is to compare and analyse the pre-treatment and post-treatment radiographic parameters in nutritional rickets with the help of Thacher's scoring technique. 176 patients with nutritional rickets were given a single intramuscular injection of vitamin D (600 000 IU) along with oral calcium (50 mg kg(-1)) and vitamin D (400 IU per day) until radiological resolution and followed for 1 year. Pre- and post-treatment radiological parameters were compared and analysed statistically based on Thacher's scoring system. Radiological resolution was complete by 6 months. Time for radiological resolution and initial radiological score were linearly associated on regression analysis. The distal ulna was the last to heal in most cases except when the initial score was 10, when distal femur was the last to heal. Thacher's scoring system can effectively monitor nutritional rickets. The formula derived through linear regression has prognostic significance. The distal femur is a better indicator in radiologically severe rickets and when resolution is delayed. Thacher's scoring is very useful for monitoring of rickets. The formula derived through linear regression can predict the expected time for radiological resolution.
Vyskocil, Erich; Gruther, Wolfgang; Steiner, Irene; Schuhfried, Othmar
2014-07-01
Disease-specific categories of the International Classification of Functioning, Disability and Health have not yet been described for patients with chronic peripheral arterial obstructive disease (PAD). The authors examined the relationship between the categories of the Brief Core Sets for ischemic heart diseases with the Peripheral Artery Questionnaire and the ankle-brachial index to determine which International Classification of Functioning, Disability and Health categories are most relevant for patients with PAD. This is a retrospective cohort study including 77 patients with verified PAD. Statistical analyses of the relationship between International Classification of Functioning, Disability and Health categories as independent variables and the endpoints Peripheral Artery Questionnaire or ankle-brachial index were carried out by simple and stepwise linear regression models adjusting for age, sex, and leg (left vs. right). The stepwise linear regression model with the ankle-brachial index as dependent variable revealed a significant effect of the variables blood vessel functions and muscle endurance functions. Calculating a stepwise linear regression model with the Peripheral Artery Questionnaire as dependent variable, a significant effect of age, emotional functions, energy and drive functions, carrying out daily routine, as well as walking could be observed. This study identifies International Classification of Functioning, Disability and Health categories in the Brief Core Sets for ischemic heart diseases that show a significant effect on the ankle-brachial index and the Peripheral Artery Questionnaire score in patients with PAD. These categories provide fundamental information on functioning of patients with PAD and patient-centered outcomes for rehabilitation interventions.
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harlim, John, E-mail: jharlim@psu.edu; Mahdi, Adam, E-mail: amahdi@ncsu.edu; Majda, Andrew J., E-mail: jonjon@cims.nyu.edu
2014-01-15
A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partialmore » noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.« less
Marcos, Raül; Llasat, Ma Carmen; Quintana-Seguí, Pere; Turco, Marco
2018-01-01
In this paper, we have compared different bias correction methodologies to assess whether they could be advantageous for improving the performance of a seasonal prediction model for volume anomalies in the Boadella reservoir (northwestern Mediterranean). The bias correction adjustments have been applied on precipitation and temperature from the European Centre for Middle-range Weather Forecasting System 4 (S4). We have used three bias correction strategies: two linear (mean bias correction, BC, and linear regression, LR) and one non-linear (Model Output Statistics analogs, MOS-analog). The results have been compared with climatology and persistence. The volume-anomaly model is a previously computed Multiple Linear Regression that ingests precipitation, temperature and in-flow anomaly data to simulate monthly volume anomalies. The potential utility for end-users has been assessed using economic value curve areas. We have studied the S4 hindcast period 1981-2010 for each month of the year and up to seven months ahead considering an ensemble of 15 members. We have shown that the MOS-analog and LR bias corrections can improve the original S4. The application to volume anomalies points towards the possibility to introduce bias correction methods as a tool to improve water resource seasonal forecasts in an end-user context of climate services. Particularly, the MOS-analog approach gives generally better results than the other approaches in late autumn and early winter. Copyright © 2017 Elsevier B.V. All rights reserved.
Bageshwar, Deepak; Khanvilkar, Vineeta; Kadam, Vilasrao
2011-01-01
A specific, precise and stability indicating high-performance thin-layer chromatographic method for simultaneous estimation of pantoprazole sodium and itopride hydrochloride in pharmaceutical formulations was developed and validated. The method employed TLC aluminium plates precoated with silica gel 60F254 as the stationary phase. The solvent system consisted of methanol:water:ammonium acetate; 4.0:1.0:0.5 (v/v/v). This system was found to give compact and dense spots for both itopride hydrochloride (Rf value of 0.55±0.02) and pantoprazole sodium (Rf value of 0.85±0.04). Densitometric analysis of both drugs was carried out in the reflectance–absorbance mode at 289 nm. The linear regression analysis data for the calibration plots showed a good linear relationship with R2=0.9988±0.0012 in the concentration range of 100–400 ng for pantoprazole sodium. Also, the linear regression analysis data for the calibration plots showed a good linear relationship with R2=0.9990±0.0008 in the concentration range of 200–1200 ng for itopride hydrochloride. The method was validated for specificity, precision, robustness and recovery. Statistical analysis proves that the method is repeatable and selective for the estimation of both the said drugs. As the method could effectively separate the drug from its degradation products, it can be employed as a stability indicating method. PMID:29403710
Bageshwar, Deepak; Khanvilkar, Vineeta; Kadam, Vilasrao
2011-11-01
A specific, precise and stability indicating high-performance thin-layer chromatographic method for simultaneous estimation of pantoprazole sodium and itopride hydrochloride in pharmaceutical formulations was developed and validated. The method employed TLC aluminium plates precoated with silica gel 60F 254 as the stationary phase. The solvent system consisted of methanol:water:ammonium acetate; 4.0:1.0:0.5 (v/v/v). This system was found to give compact and dense spots for both itopride hydrochloride ( R f value of 0.55±0.02) and pantoprazole sodium ( R f value of 0.85±0.04). Densitometric analysis of both drugs was carried out in the reflectance-absorbance mode at 289 nm. The linear regression analysis data for the calibration plots showed a good linear relationship with R 2 =0.9988±0.0012 in the concentration range of 100-400 ng for pantoprazole sodium. Also, the linear regression analysis data for the calibration plots showed a good linear relationship with R 2 =0.9990±0.0008 in the concentration range of 200-1200 ng for itopride hydrochloride. The method was validated for specificity, precision, robustness and recovery. Statistical analysis proves that the method is repeatable and selective for the estimation of both the said drugs. As the method could effectively separate the drug from its degradation products, it can be employed as a stability indicating method.
Møller, Anne; Reventlow, Susanne; Hansen, Åse Marie; Andersen, Lars L; Siersma, Volkert; Lund, Rikke; Avlund, Kirsten; Andersen, Johan Hviid; Mortensen, Ole Steen
2015-01-01
Objectives Our aim was to study associations between physical exposures throughout working life and physical function measured as chair-rise performance in midlife. Methods The Copenhagen Aging and Midlife Biobank (CAMB) provided data about employment and measures of physical function. Individual job histories were assigned exposures from a job exposure matrix. Exposures were standardised to ton-years (lifting 1000 kg each day in 1 year), stand-years (standing/walking for 6 h each day in 1 year) and kneel-years (kneeling for 1 h each day in 1 year). The associations between exposure-years and chair-rise performance (number of chair-rises in 30 s) were analysed in multivariate linear and non-linear regression models adjusted for covariates. Results Mean age among the 5095 participants was 59 years in both genders, and, on average, men achieved 21.58 (SD=5.60) and women 20.38 (SD=5.33) chair-rises in 30 s. Physical exposures were associated with poorer chair-rise performance in both men and women, however, only associations between lifting and standing/walking and chair-rise remained statistically significant among men in the final model. Spline regression analyses showed non-linear associations and confirmed the findings. Conclusions Higher physical exposure throughout working life is associated with slightly poorer chair-rise performance. The associations between exposure and outcome were non-linear. PMID:26537502
Jakovljevic, Aleksandar; Lazic, Emira; Soldatovic, Ivan; Nedeljkovic, Nenad; Andric, Miroslav
2015-07-01
To analyze radiographic predictors for lower third molar eruption among subjects with different anteroposterior skeletal relations and of different age groups. In total, 300 lower third molars were recorded on diagnostic digital orthopantomograms (DPTs) and lateral cephalograms (LCs). The radiographs were grouped according to sagittal intermaxillary angle (ANB), subject age, and level of lower third molar eruption. The DPT was used to analyze retromolar space, mesiodistal crown width, space/width ratio, third and second molar angulation (α, γ), third molar inclination (β), and gonion angle. The LC was used to determine ANB, angles of maxillar and mandibular prognathism (SNA, SNB), mandibular plane angle (SN/MP), and mandibular lengths. A logistic regression model was created using the statistically significant predictors. The logistic regression analysis revealed a statistically significant impact of β angle and distance between gonion and gnathion (Go-Gn) on the level of lower third molar eruption (P < .001 and P < .015, respectively). The retromolar space was significantly increased in the adult subgroup for all skeletal classes. The lower third molar impaction rate was significantly higher in the adult subgroup with the Class II (62.3%) compared with Class III subjects (31.7%; P < .013). The most favorable values of linear and angular predictors of mandibular third molar eruption were measured in Class III subjects. For valid estimation of mandibular third molar eruption, certain linear and angular measures (β angle, Go-Gn), as well as the size of the retromolar space, need to be considered.
Climate patterns as predictors of amphibians species richness and indicators of potential stress
Battaglin, W.; Hay, L.; McCabe, G.; Nanjappa, P.; Gallant, Alisa L.
2005-01-01
Amphibians occupy a range of habitats throughout the world, but species richness is greatest in regions with moist, warm climates. We modeled the statistical relations of anuran and urodele species richness with mean annual climate for the conterminous United States, and compared the strength of these relations at national and regional levels. Model variables were calculated for county and subcounty mapping units, and included 40-year (1960-1999) annual mean and mean annual climate statistics, mapping unit average elevation, mapping unit land area, and estimates of anuran and urodele species richness. Climate data were derived from more than 7,500 first-order and cooperative meteorological stations and were interpolated to the mapping units using multiple linear regression models. Anuran and urodele species richness were calculated from the United States Geological Survey's Amphibian Research and Monitoring Initiative (ARMI) National Atlas for Amphibian Distributions. The national multivariate linear regression (MLR) model of anuran species richness had an adjusted coefficient of determination (R2) value of 0.64 and the national MLR model for urodele species richness had an R2 value of 0.45. Stratifying the United States by coarse-resolution ecological regions provided models for anUrans that ranged in R2 values from 0.15 to 0.78. Regional models for urodeles had R2 values. ranging from 0.27 to 0.74. In general, regional models for anurans were more strongly influenced by temperature variables, whereas precipitation variables had a larger influence on urodele models.
SU-F-R-20: Image Texture Features Correlate with Time to Local Failure in Lung SBRT Patients
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andrews, M; Abazeed, M; Woody, N
Purpose: To explore possible correlation between CT image-based texture and histogram features and time-to-local-failure in early stage non-small cell lung cancer (NSCLC) patients treated with stereotactic body radiotherapy (SBRT).Methods and Materials: From an IRB-approved lung SBRT registry for patients treated between 2009–2013 we selected 48 (20 male, 28 female) patients with local failure. Median patient age was 72.3±10.3 years. Mean time to local failure was 15 ± 7.1 months. Physician-contoured gross tumor volumes (GTV) on the planning CT images were processed and 3D gray-level co-occurrence matrix (GLCM) based texture and histogram features were calculated in Matlab. Data were exported tomore » R and a multiple linear regression model was used to examine the relationship between texture features and time-to-local-failure. Results: Multiple linear regression revealed that entropy (p=0.0233, multiple R2=0.60) from GLCM-based texture analysis and the standard deviation (p=0.0194, multiple R2=0.60) from the histogram-based features were statistically significantly correlated with the time-to-local-failure. Conclusion: Image-based texture analysis can be used to predict certain aspects of treatment outcomes of NSCLC patients treated with SBRT. We found entropy and standard deviation calculated for the GTV on the CT images displayed a statistically significant correlation with and time-to-local-failure in lung SBRT patients.« less
Omnibus Risk Assessment via Accelerated Failure Time Kernel Machine Modeling
Sinnott, Jennifer A.; Cai, Tianxi
2013-01-01
Summary Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai et al., 2011). In this paper, we derive testing and prediction methods for KM regression under the accelerated failure time model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. PMID:24328713
Roldan-Valadez, Ernesto; Garcia-Ulloa, Ana Cristina; Gonzalez-Gutierrez, Omar; Martinez-Lopez, Manuel
2011-01-01
Computed-assisted three-dimensional data (3D) allows for an accurate evaluation of volumes compared with traditional measurements. An in vitro method comparison between geometric volume and 3D volumetry to obtain reference data for pituitary volumes in normal pituitary glands (PGs) and PGs containing adenomas. Prospective, transverse, analytical study. Forty-eight subjects underwent brain magnetic resonance imaging (MRI) with 3D sequencing for computer-aided volumetry. PG phantom volumes by both methods were compared. Using the best volumetric method, volumes of normal PGs and PGs with adenoma were compared. Statistical analysis used the Bland-Altman method, t-statistics, effect size and linear regression analysis. Method comparison between 3D volumetry and geometric volume revealed a lower bias and precision for 3D volumetry. A total of 27 patients exhibited normal PGs (mean age, 42.07 ± 16.17 years), although length, height, width, geometric volume and 3D volumetry were greater in women than in men. A total of 21 patients exhibited adenomas (mean age 39.62 ± 10.79 years), and length, height, width, geometric volume and 3D volumetry were greater in men than in women, with significant volumetric differences. Age did not influence pituitary volumes on linear regression analysis. Results from the present study showed that 3D volumetry was more accurate than the geometric method. In addition, the upper normal limits of PGs overlapped with lower volume limits during early stage microadenomas.
Is adult gait less susceptible than paediatric gait to hip joint centre regression equation error?
Kiernan, D; Hosking, J; O'Brien, T
2016-03-01
Hip joint centre (HJC) regression equation error during paediatric gait has recently been shown to have clinical significance. In relation to adult gait, it has been inferred that comparable errors with children in absolute HJC position may in fact result in less significant kinematic and kinetic error. This study investigated the clinical agreement of three commonly used regression equation sets (Bell et al., Davis et al. and Orthotrak) for adult subjects against the equations of Harrington et al. The relationship between HJC position error and subject size was also investigated for the Davis et al. set. Full 3-dimensional gait analysis was performed on 12 healthy adult subjects with data for each set compared to Harrington et al. The Gait Profile Score, Gait Variable Score and GDI-kinetic were used to assess clinical significance while differences in HJC position between the Davis and Harrington sets were compared to leg length and subject height using regression analysis. A number of statistically significant differences were present in absolute HJC position. However, all sets fell below the clinically significant thresholds (GPS <1.6°, GDI-Kinetic <3.6 points). Linear regression revealed a statistically significant relationship for both increasing leg length and increasing subject height with decreasing error in anterior/posterior and superior/inferior directions. Results confirm a negligible clinical error for adult subjects suggesting that any of the examined sets could be used interchangeably. Decreasing error with both increasing leg length and increasing subject height suggests that the Davis set should be used cautiously on smaller subjects. Copyright © 2016 Elsevier B.V. All rights reserved.
Uveal Melanoma Regression after Brachytherapy: Relationship with Chromosome 3 Monosomy Status.
Salvi, Sachin M; Aziz, Hassan A; Dar, Suhail; Singh, Nakul; Hayden-Loreck, Brandy; Singh, Arun D
2017-07-01
The objective was to evaluate the relationship between the regression rate of ciliary body melanoma and choroidal melanoma after brachytherapy and chromosome 3 monosomy status. We conducted a prospective and consecutive case series of patients who underwent biopsy and brachytherapy for ciliary/choroidal melanoma. Tumor biopsy performed at the time of radiation plaque placement was analyzed with fluorescence in situ hybridization to determine the percentage of tumor cells with chromosome 3 monosomy. The regression rate was calculated as the percent change in tumor height at months 3, 6, and 12. The relationship between regression rate and tumor location, initial tumor height, and chromosome 3 monosomy (percentage) was assessed by univariate linear regression (R version 3.1.0). Of the 75 patients included in the study, 8 had ciliary body melanoma, and 67 were choroidal melanomas. The mean tumor height at the time of diagnosis was 5.2 mm (range: 1.90-13.00). The percentage composition of chromosome 3 monosomy ranged from 0-20% (n = 35) to 81-100% (n = 40). The regression of tumor height at months 3, 6, and 12 did not statistically correlate with tumor location (ciliary or choroidal), initial tumor height, or chromosome 3 monosomy (percentage). The regression rate of choroidal melanoma following brachytherapy did not correlate with chromosome 3 monosomy status.
QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.
Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan
2013-03-01
A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.
Ferrari, Alberto; Comelli, Mario
2016-12-01
In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.
Three-Dimensional City Determinants of the Urban Heat Island: A Statistical Approach
NASA Astrophysics Data System (ADS)
Chun, Bum Seok
There is no doubt that the Urban Heat Island (UHI) is a mounting problem in built-up environments, due to the energy retention by the surface materials of dense buildings, leading to increased temperatures, air pollution, and energy consumption. Much of the earlier research on the UHI has used two-dimensional (2-D) information, such as land uses and the distribution of vegetation. In the case of homogeneous land uses, it is possible to predict surface temperatures with reasonable accuracy with 2-D information. However, three-dimensional (3-D) information is necessary to analyze more complex sites, including dense building clusters. Recent research on the UHI has started to consider multi-dimensional models. The purpose of this research is to explore the urban determinants of the UHI, using 2-D/3-D urban information with statistical modeling. The research includes the following stages: (a) estimating urban temperature, using satellite images, (b) developing a 3-D city model by LiDAR data, (c) generating geometric parameters with regard to 2-/3-D geospatial information, and (d) conducting different statistical analyses: OLS and spatial regressions. The research area is part of the City of Columbus, Ohio. To effectively and systematically analyze the UHI, hierarchical grid scales (480m, 240m, 120m, 60m, and 30m) are proposed, together with linear and the log-linear regression models. The non-linear OLS models with Log(AST) as dependent variable have the highest R2 among all the OLS-estimated models. However, both SAR and GSM models are estimated for the 480m, 240m, 120m, and 60m grids to reduce their spatial dependency. Most GSM models have R2s higher than 0.9, except for the 240m grid. Overall, the urban characteristics having high impacts in all grids are embodied in solar radiation, 3-D open space, greenery, and water streams. These results demonstrate that it is possible to mitigate the UHI, providing guidelines for policies aiming to reduce the UHI.
Depression and Anxiety in Greek Male Veterans After Retirement.
Kypraiou, Aspa; Sarafis, Pavlos; Tsounis, Andreas; Bitsi, Georgia; Andreanides, Elias; Constantinidis, Theodoros; Kotrotsiou, Evaggelia; Malliarou, Maria
2017-03-01
Retirement is a turning point in human life, resulting in changes to physical and mental health status. The aim of this study was to examine the factors that are related with depression and anxiety symptoms in Greek male veterans after retirement. A total of 502 veterans participated in a cross-sectional study. Beck Depression Inventory for depression assessment and Spielberger Trait Anxiety Inventory for anxiety assessment were used. The Ethics Committee of the Technological Educational Institution of Thessaly granted permission for conducting the research, and informed consent was obtained from all the participants. Questionnaires were filled in electronically using a platform that was made for the specific research. Mean values, standard deviations, Student t test, nonparametric cluster analysis of variance, Pearson's and Spearman's coefficients, and linear regression were conducted, using the Statistical Program for Social Services version 19.0. Severe depression was found in 3.8% of veterans with a mean score of 6.78, whereas 23.2% displayed mild-to-moderate symptoms of depression. Mean score of state anxiety was found to be 36.55 and of trait anxiety 33.60. Veterans who were discharged because of stressful working conditions, those who have a high body mass index, consume regularly alcohol, smoke and were not satisfied by changes in their everyday life after retirement had significantly more symptoms of depression and anxiety, although those who retired because of family problems had significantly more symptoms of depression. Multivariate linear regression analyses indicated that dissatisfaction related to lifestyle changes had statistically significant effect on symptoms of depression and anxiety, and stressful working conditions as a leading cause for retirement had statistically significant effect on depression. Finally, according to linear regression analyses results, those who were satisfied with their professional evolution had 1.80 times lower score in depression scale. The sense of satisfaction derived from fulfilling work-related expectations when finishing a career, with changes in everyday life, and smoking and alcohol reduction, may contribute to a better adjustment during the retirement period. To our knowledge, this was the first study examining depression and anxiety levels in Greek veterans, and the sample size was large, covering a randomly chosen veteran population. On the other, it was a convenient sample, although the study results could not focus on direct-term effects of retirement (up to 3 years of retirement from active service). Primitive data may be used for research directions in the future. Reprint & Copyright © 2017 Association of Military Surgeons of the U.S.
Tweedell, Andrew J.; Haynes, Courtney A.
2017-01-01
The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60–90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity. PMID:28489897
Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas
2017-04-15
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald
2011-06-01
Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.
2011-01-01
Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. Conclusions HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems. PMID:21627852
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
On the interannual oscillations in the northern temperate total ozone
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krzyscin, J.W.
1994-07-01
The interannual variations in total ozone are studied using revised Dobson total ozone records (1961-1990) from 17 stations located within the latitude band 30 deg N - 60 deg N. To obtain the quasi-biennial oscillation (QBO), El Nino-Southern Oscillation (ENSO), and 11-year solar cycle manifestation in the `northern temperate` total ozone data, various multiple regression models are constructed by the least squares fitting to the observed ozone. The statistical relationships between the selected indices of the atmospheric variabilities and total ozone are described in the linear and nonlinear regression models. Nonlinear relationships to the predictor variables are found. That is,more » the total ozone variations are statistically modeled by nonlinear terms accounting for the coupling between QBO and ENSO, QBO and solar activity, and ENSO and solar activity. It is suggested that large reduction of total ozone values over the `northern temperate` region occurs in cold season when a strong ENSO warm event meets the west phase of the QBO during the period of high solar activity.« less
A pocket-sized metabolic analyzer for assessment of resting energy expenditure.
Zhao, Di; Xian, Xiaojun; Terrera, Mirna; Krishnan, Ranganath; Miller, Dylan; Bridgeman, Devon; Tao, Kevin; Zhang, Lihua; Tsow, Francis; Forzani, Erica S; Tao, Nongjian
2014-04-01
The assessment of metabolic parameters related to energy expenditure has a proven value for weight management; however these measurements remain too difficult and costly for monitoring individuals at home. The objective of this study is to evaluate the accuracy of a new pocket-sized metabolic analyzer device for assessing energy expenditure at rest (REE) and during sedentary activities (EE). The new device performs indirect calorimetry by measuring an individual's oxygen consumption (VO2) and carbon dioxide production (VCO2) rates, which allows the determination of resting- and sedentary activity-related energy expenditure. VO2 and VCO2 values of 17 volunteer adult subjects were measured during resting and sedentary activities in order to compare the metabolic analyzer with the Douglas bag method. The Douglas bag method is considered the Gold Standard method for indirect calorimetry. Metabolic parameters of VO2, VCO2, and energy expenditure were compared using linear regression analysis, paired t-tests, and Bland-Altman plots. Linear regression analysis of measured VO2 and VCO2 values, as well as calculated energy expenditure assessed with the new analyzer and Douglas bag method, had the following linear regression parameters (linear regression slope LRS0, and R-squared coefficient, r(2)) with p = 0: LRS0 (SD) = 1.00 (0.01), r(2) = 0.9933 for VO2; LRS0 (SD) = 1.00 (0.01), r(2) = 0.9929 for VCO2; and LRS0 (SD) = 1.00 (0.01), r(2) = 0.9942 for energy expenditure. In addition, results from paired t-tests did not show statistical significant difference between the methods with a significance level of α = 0.05 for VO2, VCO2, REE, and EE. Furthermore, the Bland-Altman plot for REE showed good agreement between methods with 100% of the results within ±2SD, which was equivalent to ≤10% error. The findings demonstrate that the new pocket-sized metabolic analyzer device is accurate for determining VO2, VCO2, and energy expenditure. Copyright © 2013 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Bentancurt, J. J. V.; Herz, B. R.; Molion, L. B.
1980-01-01
Detection of water quality in Guanabara Bay using multispectral scanning digital data taken from LANDSAT satellites was examined. To test these processes, an empirical (statistical) approach was choosen to observe the degree of relationship between LANDSAT data and the in situ data taken simultaneously. The linear and nonlinear regression analyses were taken from among those developed by INPE in 1978. Results indicate that the major regression was in the number six MSS band, atmospheric effects, which indicated a correction coefficient of 0.99 and an average error of 6.59 micrograms liter. This error was similar to that obtained in the laboratory. The chlorophyll content was between 0 and 100 micrograms/liter, as taken from the MSS of LANDSAT.
Blodgett, J.C.; Oltmann, R.N.; Poeschel, K.R.
1984-01-01
Daily mean and monthly discharges were estimated for 10 sites on the Carson and Truckee Rivers for periods of incomplete records and for tributary sites affected by reservoir regulation. On the basis of the hydrologic characteristics, stream-flow data for a water year were grouped by month or season for subsequent regression analysis. In most cases, simple linear regressions adequately defined a relation of streamflow between gaging stations, but in some instances a nonlinear relation for several months of the water year was derived. Statistical data are presented to indicate the reliability of the estimated streamflow data. Records of discharges including historical and estimated data for the gaging stations for the water years 1944-80 are presented. (USGS)
Analyzing industrial energy use through ordinary least squares regression models
NASA Astrophysics Data System (ADS)
Golden, Allyson Katherine
Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and production behavior, and identify opportunities for energy and cost savings. This thesis study also utilizes change-point and degree-day baseline energy models to disaggregate facility annual energy consumption into separate industrial end-user categories. The baseline energy model provides a suitable and economical alternative to sub-metering individual manufacturing equipment. One case study describes the conjoined use of baseline energy models and facility information gathered during a one-day onsite visit to perform an end-point energy analysis of an injection molding facility conducted by the Alabama Industrial Assessment Center. Applying baseline regression model results to the end-point energy analysis allowed the AIAC to better approximate the annual energy consumption of the facility's HVAC system.
Simplified large African carnivore density estimators from track indices.
Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J
2016-01-01
The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y = αx + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.
Analysis strategies for longitudinal attachment loss data.
Beck, J D; Elter, J R
2000-02-01
The purpose of this invited review is to describe and discuss methods currently in use to quantify the progression of attachment loss in epidemiological studies of periodontal disease, and to make recommendations for specific analytic methods based upon the particular design of the study and structure of the data. The review concentrates on the definition of incident attachment loss (ALOSS) and its component parts; measurement issues including thresholds and regression to the mean; methods of accounting for longitudinal change, including changes in means, changes in proportions of affected sites, incidence density, the effect of tooth loss and reversals, and repeated events; statistical models of longitudinal change, including the incorporation of the time element, use of linear, logistic or Poisson regression or survival analysis, and statistical tests; site vs person level of analysis, including statistical adjustment for correlated data; the strengths and limitations of ALOSS data. Examples from the Piedmont 65+ Dental Study are used to illustrate specific concepts. We conclude that incidence density is the preferred methodology to use for periodontal studies with more than one period of follow-up and that the use of studies not employing methods for dealing with complex samples, correlated data, and repeated measures does not take advantage of our current understanding of the site- and person-level variables important in periodontal disease and may generate biased results.
Terranova, Claudio; Zen, Margherita
2018-01-01
National statistics on female homicide could be a useful tool to evaluate the phenomenon and plan adequate strategies to prevent and reduce this crime. The aim of the study is to contribute to the analysis of intentional female homicides in Italy by comparing Italian trends to German and United States trends from 2008 to 2014. This is a population study based on data deriving primarily from national and European statistical institutes, from the U.S. Federal Bureau of Investigation's Uniform Crime Reporting and from the National Center for Health Statistics. Data were analyzed in relation to trends and age by Chi-square test, Student's t-test and linear regression. Results show that female homicides, unlike male homicides, remained stable in the three countries. Regression analysis showed a higher risk for female homicide in all age groups in the U.S. Middle-aged women result at higher risk, and the majority of murdered women are killed by people they know. These results confirm previous findings and suggest the need to focus also in Italy on preventive strategies to reduce those precipitating factors linked to violence and present in the course of a relationship or within the family. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Forecasting volatility with neural regression: a contribution to model adequacy.
Refenes, A N; Holt, W T
2001-01-01
Neural nets' usefulness for forecasting is limited by problems of overfitting and the lack of rigorous procedures for model identification, selection and adequacy testing. This paper describes a methodology for neural model misspecification testing. We introduce a generalization of the Durbin-Watson statistic for neural regression and discuss the general issues of misspecification testing using residual analysis. We derive a generalized influence matrix for neural estimators which enables us to evaluate the distribution of the statistic. We deploy Monte Carlo simulation to compare the power of the test for neural and linear regressors. While residual testing is not a sufficient condition for model adequacy, it is nevertheless a necessary condition to demonstrate that the model is a good approximation to the data generating process, particularly as neural-network estimation procedures are susceptible to partial convergence. The work is also an important step toward developing rigorous procedures for neural model identification, selection and adequacy testing which have started to appear in the literature. We demonstrate its applicability in the nontrivial problem of forecasting implied volatility innovations using high-frequency stock index options. Each step of the model building process is validated using statistical tests to verify variable significance and model adequacy with the results confirming the presence of nonlinear relationships in implied volatility innovations.
The Mantel-Haenszel procedure revisited: models and generalizations.
Fidler, Vaclav; Nagelkerke, Nico
2013-01-01
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented.
The Mantel-Haenszel Procedure Revisited: Models and Generalizations
Fidler, Vaclav; Nagelkerke, Nico
2013-01-01
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented. PMID:23516463
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Kuhlmann, Levin; Manton, Jonathan H; Heyse, Bjorn; Vereecke, Hugo E M; Lipping, Tarmo; Struys, Michel M R F; Liley, David T J
2017-04-01
Tracking brain states with electrophysiological measurements often relies on short-term averages of extracted features and this may not adequately capture the variability of brain dynamics. The objective is to assess the hypotheses that this can be overcome by tracking distributions of linear models using anesthesia data, and that anesthetic brain state tracking performance of linear models is comparable to that of a high performing depth of anesthesia monitoring feature. Individuals' brain states are classified by comparing the distribution of linear (auto-regressive moving average-ARMA) model parameters estimated from electroencephalographic (EEG) data obtained with a sliding window to distributions of linear model parameters for each brain state. The method is applied to frontal EEG data from 15 subjects undergoing propofol anesthesia and classified by the observers assessment of alertness/sedation (OAA/S) scale. Classification of the OAA/S score was performed using distributions of either ARMA parameters or the benchmark feature, Higuchi fractal dimension. The highest average testing sensitivity of 59% (chance sensitivity: 17%) was found for ARMA (2,1) models and Higuchi fractal dimension achieved 52%, however, no statistical difference was observed. For the same ARMA case, there was no statistical difference if medians are used instead of distributions (sensitivity: 56%). The model-based distribution approach is not necessarily more effective than a median/short-term average approach, however, it performs well compared with a distribution approach based on a high performing anesthesia monitoring measure. These techniques hold potential for anesthesia monitoring and may be generally applicable for tracking brain states.
Gimelfarb, A.; Willis, J. H.
1994-01-01
An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818
Stature estimation from the lengths of the growing foot-a study on North Indian adolescents.
Krishan, Kewal; Kanchan, Tanuj; Passi, Neelam; DiMaggio, John A
2012-12-01
Stature estimation is considered as one of the basic parameters of the investigation process in unknown and commingled human remains in medico-legal case work. Race, age and sex are the other parameters which help in this process. Stature estimation is of the utmost importance as it completes the biological profile of a person along with the other three parameters of identification. The present research is intended to formulate standards for stature estimation from foot dimensions in adolescent males from North India and study the pattern of foot growth during the growing years. 154 male adolescents from the Northern part of India were included in the study. Besides stature, five anthropometric measurements that included the length of the foot from each toe (T1, T2, T3, T4, and T5 respectively) to pternion were measured on each foot. The data was analyzed statistically using Student's t-test, Pearson's correlation, linear and multiple regression analysis for estimation of stature and growth of foot during ages 13-18 years. Correlation coefficients between stature and all the foot measurements were found to be highly significant and positively correlated. Linear regression models and multiple regression models (with age as a co-variable) were derived for estimation of stature from the different measurements of the foot. Multiple regression models (with age as a co-variable) estimate stature with greater accuracy than the regression models for 13-18 years age group. The study shows the growth pattern of feet in North Indian adolescents and indicates that anthropometric measurements of the foot and its segments are valuable in estimation of stature in growing individuals of that population. Copyright © 2012 Elsevier Ltd. All rights reserved.
Szekér, Szabolcs; Vathy-Fogarassy, Ágnes
2018-01-01
Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.
Bastistella, Luciane; Rousset, Patrick; Aviz, Antonio; Caldeira-Pires, Armando; Humbert, Gilles; Nogueira, Manoel
2018-02-09
New experimental techniques, as well as modern variants on known methods, have recently been employed to investigate the fundamental reactions underlying the oxidation of biochar. The purpose of this paper was to experimentally and statistically study how the relative humidity of air, mass, and particle size of four biochars influenced the adsorption of water and the increase in temperature. A random factorial design was employed using the intuitive statistical software Xlstat. A simple linear regression model and an analysis of variance with a pairwise comparison were performed. The experimental study was carried out on the wood of Quercus pubescens , Cyclobalanopsis glauca , Trigonostemon huangmosun , and Bambusa vulgaris , and involved five relative humidity conditions (22, 43, 75, 84, and 90%), two mass samples (0.1 and 1 g), and two particle sizes (powder and piece). Two response variables including water adsorption and temperature increase were analyzed and discussed. The temperature did not increase linearly with the adsorption of water. Temperature was modeled by nine explanatory variables, while water adsorption was modeled by eight. Five variables, including factors and their interactions, were found to be common to the two models. Sample mass and relative humidity influenced the two qualitative variables, while particle size and biochar type only influenced the temperature.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
An Expert System for the Evaluation of Cost Models
1990-09-01
contrast to the condition of equal error variance, called homoscedasticity. (Reference: Applied Linear Regression Models by John Neter - page 423...normal. (Reference: Applied Linear Regression Models by John Neter - page 125) Click Here to continue -> Autocorrelation Click Here for the index - Index...over time. Error terms correlated over time are said to be autocorrelated or serially correlated. (REFERENCE: Applied Linear Regression Models by John
Change Point Problems in Regression
1988-06-01
equals W at i/m (i = 1,...,m) and is linear in each interval (i/m, (i + 1)/m). I!,. Section 2.2: Asymptotic Behavior of Test Statistics 13 Lemma ...Numbers implies that Qzz.vn/m - 1, QZV,/m --+ 0 in probability. By the previous lemma . B’-- W1, B’- W2 in distribution, and hence Dm - 1 in probability...2 converges to the same limit as in (2.5) by the Slutsky’s Lemma . I S Now we will consider the conditional test for HO. This conditional test is based
Have the temperature time series a structural change after 1998?
NASA Astrophysics Data System (ADS)
Werner, Rolf; Valev, Dimitare; Danov, Dimitar
2012-07-01
The global and hemisphere temperature GISS and Hadcrut3 time series were analysed for structural changes. We postulate the continuity of the preceding temperature function depending from the time. The slopes are calculated for a sequence of segments limited by time thresholds. We used a standard method, the restricted linear regression with dummy variables. We performed the calculations and tests for different number of thresholds. The thresholds are searched continuously in determined time intervals. The F-statistic is used to obtain the time points of the structural changes.
Spacecraft platform cost estimating relationships
NASA Technical Reports Server (NTRS)
Gruhl, W. M.
1972-01-01
The three main cost areas of unmanned satellite development are discussed. The areas are identified as: (1) the spacecraft platform (SCP), (2) the payload or experiments, and (3) the postlaunch ground equipment and operations. The SCP normally accounts for over half of the total project cost and accurate estimates of SCP costs are required early in project planning as a basis for determining total project budget requirements. The development of single formula SCP cost estimating relationships (CER) from readily available data by statistical linear regression analysis is described. The advantages of single formula CER are presented.
Estimating the size of an open population using sparse capture-recapture data.
Huggins, Richard; Stoklosa, Jakub; Roach, Cameron; Yip, Paul
2018-03-01
Sparse capture-recapture data from open populations are difficult to analyze using currently available frequentist statistical methods. However, in closed capture-recapture experiments, the Chao sparse estimator (Chao, 1989, Biometrics 45, 427-438) may be used to estimate population sizes when there are few recaptures. Here, we extend the Chao (1989) closed population size estimator to the open population setting by using linear regression and extrapolation techniques. We conduct a small simulation study and apply the models to several sparse capture-recapture data sets. © 2017, The International Biometric Society.
Background stratified Poisson regression analysis of cohort data.
Richardson, David B; Langholz, Bryan
2012-03-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
An algebraic method for constructing stable and consistent autoregressive filters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harlim, John, E-mail: jharlim@psu.edu; Department of Meteorology, the Pennsylvania State University, University Park, PA 16802; Hong, Hoon, E-mail: hong@ncsu.edu
2015-02-15
In this paper, we introduce an algebraic method to construct stable and consistent univariate autoregressive (AR) models of low order for filtering and predicting nonlinear turbulent signals with memory depth. By stable, we refer to the classical stability condition for the AR model. By consistent, we refer to the classical consistency constraints of Adams–Bashforth methods of order-two. One attractive feature of this algebraic method is that the model parameters can be obtained without directly knowing any training data set as opposed to many standard, regression-based parameterization methods. It takes only long-time average statistics as inputs. The proposed method provides amore » discretization time step interval which guarantees the existence of stable and consistent AR model and simultaneously produces the parameters for the AR models. In our numerical examples with two chaotic time series with different characteristics of decaying time scales, we find that the proposed AR models produce significantly more accurate short-term predictive skill and comparable filtering skill relative to the linear regression-based AR models. These encouraging results are robust across wide ranges of discretization times, observation times, and observation noise variances. Finally, we also find that the proposed model produces an improved short-time prediction relative to the linear regression-based AR-models in forecasting a data set that characterizes the variability of the Madden–Julian Oscillation, a dominant tropical atmospheric wave pattern.« less
Effects of integration time on in-water radiometric profiles.
D'Alimonte, Davide; Zibordi, Giuseppe; Kajiyama, Tamito
2018-03-05
This work investigates the effects of integration time on in-water downward irradiance E d , upward irradiance E u and upwelling radiance L u profile data acquired with free-fall hyperspectral systems. Analyzed quantities are the subsurface value and the diffuse attenuation coefficient derived by applying linear and non-linear regression schemes. Case studies include oligotrophic waters (Case-1), as well as waters dominated by Colored Dissolved Organic Matter (CDOM) and Non-Algal Particles (NAP). Assuming a 24-bit digitization, measurements resulting from the accumulation of photons over integration times varying between 8 and 2048ms are evaluated at depths corresponding to: 1) the beginning of each integration interval (Fst); 2) the end of each integration interval (Lst); 3) the averages of Fst and Lst values (Avg); and finally 4) the values weighted accounting for the diffuse attenuation coefficient of water (Wgt). Statistical figures show that the effects of integration time can bias results well above 5% as a function of the depth definition. Results indicate the validity of the Wgt depth definition and the fair applicability of the Avg one. Instead, both the Fst and Lst depths should not be adopted since they may introduce pronounced biases in E u and L u regression products for highly absorbing waters. Finally, the study reconfirms the relevance of combining multiple radiometric casts into a single profile to increase precision of regression products.
Khanal, Laxman; Shah, Sandip; Koirala, Sarun
2017-03-01
Length of long bones is taken as an important contributor for estimating one of the four elements of forensic anthropology i.e., stature of the individual. Since physical characteristics of the individual differ among different groups of population, population specific studies are needed for estimating the total length of femur from its segment measurements. Since femur is not always recovered intact in forensic cases, it was the aim of this study to derive regression equations from measurements of proximal and distal fragments in Nepalese population. A cross-sectional study was done among 60 dry femora (30 from each side) without sex determination in anthropometry laboratory. Along with maximum femoral length, four proximal and four distal segmental measurements were measured following the standard method with the help of osteometric board, measuring tape and digital Vernier's caliper. Bones with gross defects were excluded from the study. Measured values were recorded separately for right and left side. Statistical Package for Social Science (SPSS version 11.5) was used for statistical analysis. The value of segmental measurements were different between right and left side but statistical difference was not significant except for depth of medial condyle (p=0.02). All the measurements were positively correlated and found to have linear relationship with the femoral length. With the help of regression equation, femoral length can be calculated from the segmental measurements; and then femoral length can be used to calculate the stature of the individual. The data collected may contribute in the analysis of forensic bone remains in study population.
Alternative approaches to predicting methane emissions from dairy cows.
Mills, J A N; Kebreab, E; Yates, C M; Crompton, L A; Cammell, S B; Dhanoa, M S; Agnew, R E; France, J
2003-12-01
Previous attempts to apply statistical models, which correlate nutrient intake with methane production, have been of limited value where predictions are obtained for nutrient intakes and diet types outside those used in model construction. Dynamic mechanistic models have proved more suitable for extrapolation, but they remain computationally expensive and are not applied easily in practical situations. The first objective of this research focused on employing conventional techniques to generate statistical models of methane production appropriate to United Kingdom dairy systems. The second objective was to evaluate these models and a model published previously using both United Kingdom and North American data sets. Thirdly, nonlinear models were considered as alternatives to the conventional linear regressions. The United Kingdom calorimetry data used to construct the linear models also were used to develop the three nonlinear alternatives that were all of modified Mitscherlich (monomolecular) form. Of the linear models tested, an equation from the literature proved most reliable across the full range of evaluation data (root mean square prediction error = 21.3%). However, the Mitscherlich models demonstrated the greatest degree of adaptability across diet types and intake level. The most successful model for simulating the independent data was a modified Mitscherlich equation with the steepness parameter set to represent dietary starch-to-ADF ratio (root mean square prediction error = 20.6%). However, when such data were unavailable, simpler Mitscherlich forms relating dry matter or metabolizable energy intake to methane production remained better alternatives relative to their linear counterparts.
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
Association of dentine hypersensitivity with different risk factors - a cross sectional study.
Vijaya, V; Sanjay, Venkataraam; Varghese, Rana K; Ravuri, Rajyalakshmi; Agarwal, Anil
2013-12-01
This study was done to assess the prevalence of Dentine hypersensitivity (DH) and its associated risk factors. This epidemiological study was done among patients coming to dental college regarding prevalence of DH. A self structured questionnaire along with clinical examination was done for assessment. Descriptive statistics were obtained and frequency distribution was calculated using Chi square test at p value <0.05. Stepwise multiple linear regression was also done to access frequency of DH with different factors. The study population was comprised of 655 participants with different age groups. Our study showed prevalence as 55% and it was more common among males. Similarly smokers and those who use hard tooth brush had more cases of DH. Step wise multiple linear regression showed that best predictor for DH was age followed by habit of smoking and type of tooth brush. Most aggravating factors were cold water (15.4%) and sweet foods (14.7%), whereas only 5% of the patients had it while brushing. A high level of dental hypersensitivity has been in this study and more common among males. A linear finding was shown with age, smoking and type of tooth brush. How to cite this article: Vijaya V, Sanjay V, Varghese RK, Ravuri R, Agarwal A. Association of Dentine Hypersensitivity with Different Risk Factors - A Cross Sectional Study. J Int Oral Health 2013;5(6):88-92 .
Advanced Statistical Analyses to Reduce Inconsistency of Bond Strength Data.
Minamino, T; Mine, A; Shintani, A; Higashi, M; Kawaguchi-Uemura, A; Kabetani, T; Hagino, R; Imai, D; Tajiri, Y; Matsumoto, M; Yatani, H
2017-11-01
This study was designed to clarify the interrelationship of factors that affect the value of microtensile bond strength (µTBS), focusing on nondestructive testing by which information of the specimens can be stored and quantified. µTBS test specimens were prepared from 10 noncarious human molars. Six factors of µTBS test specimens were evaluated: presence of voids at the interface, X-ray absorption coefficient of resin, X-ray absorption coefficient of dentin, length of dentin part, size of adhesion area, and individual differences of teeth. All specimens were observed nondestructively by optical coherence tomography and micro-computed tomography before µTBS testing. After µTBS testing, the effect of these factors on µTBS data was analyzed by the general linear model, linear mixed effects regression model, and nonlinear regression model with 95% confidence intervals. By the general linear model, a significant difference in individual differences of teeth was observed ( P < 0.001). A significantly positive correlation was shown between µTBS and length of dentin part ( P < 0.001); however, there was no significant nonlinearity ( P = 0.157). Moreover, a significantly negative correlation was observed between µTBS and size of adhesion area ( P = 0.001), with significant nonlinearity ( P = 0.014). No correlation was observed between µTBS and X-ray absorption coefficient of resin ( P = 0.147), and there was no significant nonlinearity ( P = 0.089). Additionally, a significantly positive correlation was observed between µTBS and X-ray absorption coefficient of dentin ( P = 0.022), with significant nonlinearity ( P = 0.036). A significant difference was also observed between the presence and absence of voids by linear mixed effects regression analysis. Our results showed correlations between various parameters of tooth specimens and µTBS data. To evaluate the performance of the adhesive more precisely, the effect of tooth variability and a method to reduce variation in bond strength values should also be considered.
Fananapazir, Ghaneh; Benzl, Robert; Corwin, Michael T; Chen, Ling-Xin; Sageshima, Junichiro; Stewart, Susan L; Troppmann, Christoph
2018-07-01
Purpose To determine whether the predonation computed tomography (CT)-based volume of the future remnant kidney is predictive of postdonation renal function in living kidney donors. Materials and Methods This institutional review board-approved, retrospective, HIPAA-compliant study included 126 live kidney donors who had undergone predonation renal CT between January 2007 and December 2014 as well as 2-year postdonation measurement of estimated glomerular filtration rate (eGFR). The whole kidney volume and cortical volume of the future remnant kidney were measured and standardized for body surface area (BSA). Bivariate linear associations between the ratios of whole kidney volume to BSA and cortical volume to BSA were obtained. A linear regression model for 2-year postdonation eGFR that incorporated donor age, sex, and either whole kidney volume-to-BSA ratio or cortical volume-to-BSA ratio was created, and the coefficient of determination (R 2 ) for the model was calculated. Factors not statistically additive in assessing 2-year eGFR were removed by using backward elimination, and the coefficient of determination for this parsimonious model was calculated. Results Correlation was slightly better for cortical volume-to-BSA ratio than for whole kidney volume-to-BSA ratio (r = 0.48 vs r = 0.44, respectively). The linear regression model incorporating all donor factors had an R 2 of 0.66. The only factors that were significantly additive to the equation were cortical volume-to-BSA ratio and predonation eGFR (P = .01 and P < .01, respectively), and the final parsimonious linear regression model incorporating these two variables explained almost the same amount of variance (R 2 = 0.65) as did the full model. Conclusion The cortical volume of the future remnant kidney helped predict postdonation eGFR at 2 years. The cortical volume-to-BSA ratio should thus be considered for addition as an important variable to living kidney donor evaluation and selection guidelines. © RSNA, 2018.
NASA Astrophysics Data System (ADS)
ul-Haq, Zia; Rana, Asim Daud; Tariq, Salman; Mahmood, Khalid; Ali, Muhammad; Bashir, Iqra
2018-03-01
We have applied regression analyses for the modeling of tropospheric NO2 (tropo-NO2) as the function of anthropogenic nitrogen oxides (NOx) emissions, aerosol optical depth (AOD), and some important meteorological parameters such as temperature (Temp), precipitation (Preci), relative humidity (RH), wind speed (WS), cloud fraction (CLF) and outgoing long-wave radiation (OLR) over different climatic zones and land use/land cover types in South Asia during October 2004-December 2015. Simple linear regression shows that, over South Asia, tropo-NO2 variability is significantly linked to AOD, WS, NOx, Preci and CLF. Also zone-5, consisting of tropical monsoon areas of eastern India and Myanmar, is the only study zone over which all the selected parameters show their influence on tropo-NO2 at statistical significance levels. In stepwise multiple linear modeling, tropo-NO2 column over landmass of South Asia, is significantly predicted by the combination of RH (standardized regression coefficient, β = - 49), AOD (β = 0.42) and NOx (β = 0.25). The leading predictors of tropo-NO2 columns over zones 1-5 are OLR, AOD, Temp, OLR, and RH respectively. Overall, as revealed by the higher correlation coefficients (r), the multiple regressions provide reasonable models for tropo-NO2 over South Asia (r = 0.82), zone-4 (r = 0.90) and zone-5 (r = 0.93). The lowest r (of 0.66) has been found for hot semi-arid region in northwestern Indus-Ganges Basin (zone-2). The highest value of β for urban area AOD (of 0.42) is observed for megacity Lahore, located in warm semi-arid zone-2 with large scale crop-residue burning, indicating strong influence of aerosols on the modeled tropo-NO2 column. A statistical significant correlation (r = 0.22) at the 0.05 level is found between tropo-NO2 and AOD over Lahore. Also NOx emissions appear as the highest contributor (β = 0.59) for modeled tropo-NO2 column over megacity Dhaka.
Nikam, P. H.; Kareparamban, J. A.; Jadhav, A. P.; Kadam, V. J.
2013-01-01
Ursolic acid, a pentacyclic triterpenoid possess a wide range of pharmacological activities. It shows hypoglycemic, antiandrogenic, antibacterial, antiinflammatory, antioxidant, diuretic and cynogenic activity. It is commonly present in plants especially coating of leaves and fruits, such as apple fruit, vinca leaves, rosemary leaves, and eucalyptus leaves. A simple high-performance thin layer chromatographic method has been developed for the quantification of ursolic acid from apple peel (Malus domestica). The samples dissolved in methanol and linear ascending development was carried out in twin trough glass chamber. The mobile phase was selected as toluene:ethyl acetate:glacial acetic acid (70:30:2). The linear regression analysis data for the calibration plots showed good linear relationship with r2=0.9982 in the concentration range 0.2-7 μg/spot with respect to peak area. According to the ICH guidelines the method was validated for linearity, accuracy, precision, and robustness. Statistical analysis of the data showed that the method is reproducible and selective for the estimation of ursolic acid. PMID:24302805